Tag Archive: computing

Jul 14 2010

Some notes on SQL: 4 – Advanced select

This is the fourth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database. This post covers more advanced commands for selecting information from a database and ways of manipulating the results returned. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.

SQL supports CASE statements, similar to those which are found in a range of programming languages, they are used to write multiple comparison sequences more compactly:

UPDATE my_table
SET    new_column = CASE
WHEN column1 = somevalue1 THEN newvalue1
WHEN column2 = somevalue2 THEN newvalue2
ELSE newvalue3

The CASE statement can also be used in a SELECT:

SELECT title,
budget = CASE price
WHEN price > 20.00 THEN ‘Expensive’
WHEN price BETWEEN 10.00 AND 19.99 THEN ‘Moderate’
WHEN price < 10.00 THEN ‘Inexpensive’
ELSE ‘Unknown’
FROM   titles

(This second example is from here)

The way in which results are returned from a SELECT statement can be controlled by the ORDER BY keyword with the ASC (or ASCENDING) and DESC (or DESCENDING) modifiers. Results can be ordered by multiple keys. The sort order is numbers before letters, and uppercase letters before lowercase letters.

SELECT title,purchased
FROM   movie_table
ORDER  BY title ASC, purchased DESC;

ASCENDING order is assumed in the absence of the explicit keyword.

There are various functions that can be applied to sets of rows returned in a query to produce a single value these include MIN, MAX, AVG, COUNT and SUM. The  functions are used like this:

FROM   cookie_sales
WHERE  first_name = ‘Nicole’;

This returns a sum of all of the “sales” values returned by the WHERE clause. Related is DISTINCT which is a keyword rather than a function so the syntax is slightly different:

FROM   cookie_sales
ORDER  BY sale_date;

This returns a set of unique dates in the sale_date column.

The GROUP BY keyword is used to facilitate the use of functions such as SUM etc which take multiple arguments to produce a single output, or to reduce a list to distinct elements (in these circumstances it is identical to the DISTINCT keyword but execution may be faster). The format for GROUP BY is shown, by example below:

SELECT first_name, SUM(sales)
FROM   cookie_sales
GROUP  BY first_name;

This will return a sum of the “sales” by each person identified by “first_name”. A final keyword used to control the output of a SELECT statement is the LIMIT keyword which can take one or two parameters the behaviour for the two forms is quite different. One parameter form:

SELECT * FROM your_table LIMIT  5;

This returns the first five results from a SELECT. Two parameter form:

SELECT * FROM your_table LIMIT  5, 5;

This returns results 6,7,8,9 and 10 from the SELECT. The first parameter is the index of the first result to return (starting at 0 for the first position) and the second parameter is the number of results to return.


Jun 24 2010

Some notes on SQL: 3 – changing a table

This is the third in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database. This post covers commands to modify the structure and contents of an existing database. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.

UPDATE and DELETE allow the rows in a table to be either updated or deleted according to a select-like WHERE clause. This is UPDATE, acting on multiple columns:

UPDATE your_table
SET    first_column = ‘newvalue’,
second_column = ‘another_value’
WHERE  some_column = ‘a test’;

And this is the DELETE command:

DELETE FROM your_table
WHERE  some_column = ‘a test’;

In combination with the ALTER keyword, the following operations can be performed:
The CHANGE keyword allows the name and data type of an existing column to be changed.

ALTER TABLE project_table 
CHANGE COLUMN a_silly_column_name a_better_column_name VARCHAR(100), 
CHANGE COLUMN another_poorly_named_column a_better_name VARCHAR(30);

It’s necessary to be cautious here because data loss can occur depending on the source and destination types, for example going from VARCHAR(100) to VARCHAR(30) could potentially lose 70 characters.
The MODIFY keyword allows the data type or position of an existing column to be changed.

ALTER TABLE my_table 
MODIFY COLUMN target_column VARCHAR(120), 
MODIFY COLUMN another_column AFTER target_column;

The ADD keyword allows new columns to be added to a table:

ALTER TABLE my_table 

The ADD and MODIFY keywords take position identifiers: FIRST, LAST and BEFORE, AFTER – which require a second column identifier as indicated in the MODIFY example.

In addition RENAME TO allows the table to be renamed:

ALTER TABLE poor_name RENAME TO good_name;

And DROP deletes a column:

ALTER TABLE my_table DROP COLUMN unwanted_column; 

Obviously you should use DROP COLUMN cautiously!


Jun 17 2010

Some notes on SQL: 2 – Basic SELECT

Part 1 of this sequence of blog posts provided a preamble and showed how to create databases. This post introduces the basic SELECT command, which shows you what lies within your database and as it’s name implies allows you to select only parts of the data contained within.

The basic form of SELECT is:


* indicates that all fields should be returned from the table ‘my_contacts’, where the first_name field = ‘Anne’. We don’t have to take all the fields from a table:


As well as the equivalence operator =, we can also use comparison operators <> (not equal), <, >, <=, >= these work not only with numerical values, but also with text values. WHERE clauses can also be combined with AND and OR operators.



The second select using the BETWEEN keyword is equivalent to the first.
In addition there are wildcards, % meaning ‘any number of characters’ and _ meaning ‘one character’ which are accessed via the LIKE keyword:


This first search will return ‘Tim’, ‘Slim’, and ‘Ephraim’.


This second search will only return ‘Tim’. NULL is special, nothing equals NULL but you can check if something is NULL:


Comparisons can be made to a list with the IN keyword:

SELECT drink_name FROM drink_info
WHERE rating IN ( ‘good’, ‘excellent’, ‘average’ );

Finally, the NOT operator can be used to find the inverse of the selection made, the NOT keyword goes directly after IN but otherwise goes after WHERE:

WHEREratingNOT IN (‘good’,‘excellent’,‘average’);

WHERE NOTfirst_nameLIKE‘_im’;


Jun 16 2010

Some notes on SQL: 1 – creation

These are some notes on SQL a language for creating and querying databases, I’m learning it because a couple of programming projects I have in mind for work and home will need it. The source for these notes is  Head First SQL book. I’ve used a previous book in this series and I quite like the presentational style. I’m using MySQL via it’s command line client to do the exercises, because it’s about as straightforward as you can get. The code is formatted for display using this SQL to HTML formatter. Notepad++ recognises SQL as a language and will do syntax highlighting, so I type my commands into it and copy them into the MySQL command line client.

SQL is quite an old language and the convention is to write keywords in block capitals (reminds me of FORTRAN!). Command sequences are terminated by a semi-colon.

To start, this sequence creates a database, sets it as active and then adds a table containing a range of fields of different types, the command DESC shows the layout of a table:

CREATE DATABASE my_database;
USE my_database
contact_id INT NOT NULL,
first_name VARCHAR(20),
last_name VARCHAR(20),
birthday DATE,
life_story BLOB,
weight DEC(3, 2) NOT NULL DEFAULT 80.00,
state_code CHAR(2),
appointment DATETIME

Desc contacts;

NOT NULL keywords are used if a field must be specified on INSERT. Once created data can be added to the table using the INSERT command:

INSERT INTO contacts (contact_id,first_name,last_name,birthday,life_story,weight,
(1,‘Ian’,‘Hopkinson’,‘1970-24-04’,‘A very long text string’,80.0,
’10:30am 2010-21-06′

INSERT can be used with no specified fields (in which case values for all supplied fields must be provided for all fields), or with a subset of fields. In order to add the ‘ character we can either use ” or \’ (where ” is two single quote characters, rather than a double quote).

To delete a table:

DROP TABLE contacts;

This command should be used with care since it deletes the table whether or not it contains data. The next post should be on the SELECT command. 

Of course you can find SQL cheatsheets elsewhere.


May 29 2010

That’s nice, dear

This blog post is about programming, for people that don’t program – at least that’s the effect I’m aiming for. The title is in recognition of my tolerant wife, The Inelegant Gardener, who has learnt the appropriate response to my enthusiastic displays of the results of my programming: “That’s nice, dear”!.

I started programming a long time ago – in around 1980, at the school computer club, when I was 10. Since then I’ve been taught odd bits of programming by scientists, and done quite a lot of programming as part of my scientific job. I’ve started to get more interested in proper software engineering in the last few years. This is a roundabout way of saying I am an enthusiastic amateur.

People associate programming with the mathematically minded, but this isn’t necessarily the case: the codebreakers at Bletchley Park, who were amongst the first users of electronic computers, had a range of skills – amongst them were linguists and crossword wizards. I was talking to a Fellow in linguistics, who’d helped write his college’s library software – as he pointed out: a very logical view of language is a great benefit for a programmer. Programming is about giving an idiot very exact instructions, if the instructions concern maths then you need to know maths – otherwise you don’t.

The core of programming is still what I learned years ago, data (numbers or letters) is stored in “variables” that have names. There are conditional statements: “If [something is true] Then [do this] or else [do the other]”. There are looping statements: “Do this 100 times”. And there are functions: “add 2 to this number, square it, add the number you first thought of and tell me the answer” or “how many times does the letter a occur in this sentence”.

These simple statements are being buried under an increasing depth of additional ideas. Since the 80’s the big thing in programming has been “object-orientation”. In object-orientated programming you package up data of a particular sort with functions that relate to that data. So if you had data modelling an octopus you would include functions such as “wave-tentacles” and “change colour”, such functions would be useless for data describing a horse. The real benefit to this is comprehending larger software systems, because a sea of functions and data is grouped together into logical islands. Beyond this there are design patterns – reoccurring systems of objects which I haven’t entirely go the hang of.

In addition to the changes in language, there are changes in the tools used to program: syntax highlighting is nice, it amounts to colouring the verbs, nouns and proper names in programming in different colours – makes it easier to spot mistakes. Auto-completion is another handy tool, in a well-designed language there are only a limited number of next possible statements when you are programming – auto-completion presents you with them as you type. Sites like Stackoverflow are great for asking programming questions, and there no end of function libraries available on the web to help you out.

I have a number of little software projects on the go, you can see them in much the same way as woodworking projects, suduko or crosswords: they keep me out the way, muttering quietly to myself and exercising my brain. It doesn’t matter that what I’m doing isn’t groundbreaking and new.

Programming does lead to some odd habits; when I started programming it was useful to know binary and hexadecimal number systems, as a consequence I believe that numbers such as 1024 and 128 are nice and round. I’ve come to appreciate a wide range of bracket styles [] (){} since they are all used for different things and the semi-colon is one of the most important pieces of punctuation in my life. If I program for too long in a stretch I start to forget how to speak to people.

And just to show off the results of my latest fiddlings: maps of the UK election results. I got interested in doing this just after the General Election. The Guardian has published a lovely spreadsheet of election results, including data on every single candidate. You see lots of maps of data of this sort, I wanted to know how it was done. (Technical details beyond the maps.)

First of all the gender of MP’s by constituency: constituencies represented by ladies are marked pink, those by men marked blue:

The black constituency in northern England is Thirsk and Malton, which held its election on 27th May, following the death of one of the candidates during the general election campaign.
The population of each constituency is also interesting, here I have coloured the constituencies with 9 different shades of green, the palest shade corresponds to a voting population of between 20,000 and 30,000, the darkest shade corresponds to a population of between 100,000 and 110,000:
The Western Isles (now known by it’s Gaelic name: Na h-Eileanan an Iar) has the smallest population at about 22,000 and the Isle of Wight has the largest population with just under 110,000 potential voters. I used ColorBrewer to find a nice set of colours.
Finally here’s a map of which party came second in each constituency in the 2010 General Election:
Red for Labour, blue for the Conservatives, orange for Liberal Democrats, yellow for Scottish Nationalists, pale green for Plaid Cymru, dark green for Sinn Fein, blue for Ulster Conservatives and Unionists, and there are a few independents and minor Northern Island parties which are all coloured white.


So the task is to get the spreadsheet data into a map: To get started I did a bit of memory trawling and googling, a couple of people have written about colouring in maps: this one uses shapefile format map data and the R programming language, whilst this one uses SVG format map data and Python (another programming language). It turns out the shapefile format data for constituencies is a little difficult to get – you have to fill in forms! However enterprising people on Wikipedia have made SVG format constituency maps available. SVG stands for Scaleable Vector Graphics, it’s an XML format which means it’s plaintext and there are standard means to extract data from it and manipulate it. The only real problem is that the constituency names in the spreadsheet don’t exactly match the names inside the SVG format map – I had to resort to some horrible constituency by constituency coding for a load of them. To do this I used the C# programming language, largely because Visual Studio Express C# is a very nice, free development environment which I’ve used before. To view the SVG maps inside my application I used the Webkit .NET library to provide a webbrowser control (which wraps up the rendering engine used in the Safari and Google Chrome browsers) – the native C# webbrowser control is based on Internet Explorer – which doesn’t render SVG. Output to bitmaps is a bit clumsy, Inkscape (a free SVG editor) wasn’t keen on displaying the original constituency map, so I resorted to viewing the map in Google Chrome and taking a screen shot (a terrible bodge).

Older posts «

» Newer posts