More news from the shed…

CWACResults2011

In the month of May I seem to find myself playing with maps and numbers.

To the uninvolved this may appear to be rather similar to my earlier “That’s nice dear”, however the technology involved here is quite different.

This post is about extracting the results from the local elections held on 5th May from the Cheshire West and Chester website and displaying them as a map. I could have manually transcribed the results from the website, this would probably be quicker, but where’s the fun in that?

The starting point for this exercise was noticing that the results pages have a little icon at the bottom saying “OpenElectionData”. This was part of an exercise to make local election results more easily machine-readable in order to build a database of results from across the country, somewhat surprisingly there is no public central record of local council election results. The technology used to provide machine access to the results is known as RDF (standing for Resource Description Framework), this is a way of providing “meaning” to web pages for machines to understand – this is related to the talk of the semantic web. The good folks at Southampton University have provided a browser which allows you to inspect the RDF contents of a webpage. I used this to get a human sight of the data I was trying to read.

RDF content ultimately amounts to triplets of information: “subject”,”predicate”,”object”. In the case of an election then one triplet has a subject of “specific ward identifier” the predicate is “a list of candidates” and the object is “candidate 1;candidate 2; candidate 3…”. Further triplets specify the whether a candidate was elected, how many votes they received and the party to which they belong.

I’ve taken to programming in Python recently, in particular using the Python(x,y) distribution which packages together an IDE with some libraries useful to scientists. This is the sort of thing I’d usually do with Matlab, but that costs (a lot) and I no longer have access to it at home.

There is a Python library for reading RDF data, called RDFlib, unfortunately most of the documentation is for version 2.4 and the working version which I downloaded is 3.0. Searching for documentation for the newer version normally leads to other sites where people are asking where the documentation is for version 3.0!

The base maps come from the Ordnance Survey, specifically the Boundary Line dataset which contains administrative boundary data for the UK in ESRI Shapefile format. This format is widely used for geographical information work, I found the PyShp library from GeospatialPython.com to be well-documented and straightforward way to read the format. The site also has some nice usage examples. I did look for a library to display the resulting maps but after a brief search I adapted the simple methods here for drawing maps using matlibplot.

The Ordnance Survey Open Data site is a treasure trove for programming cartophiles, along with maps of the UK of various types there’s a gazetteer of interesting places, topographic information and location data for UK postcode.

The map at the top of the page uses the traditional colour-coding of red for Labour and blue for Conservative, some wards elect multiple candidates and in those where the elected councillors are not all from the same party purple is used to show a Labour/Conservative combination and orange a Labour/Liberal Democrat combination.

In contrast to my earlier post on programming, the key elements here are the use of pre-existing libraries and data formats to achieve an end result. The RDF component of the exercise took quite a while, whilst the mapping part was the work of a couple of hours. This largely comes down to the quality of the documentation available. Python turns out to be a compact language to do this sort of work, it’s all done in 150 or so lines of code.

It would have been nice to have pointed my program to a single webpage and for it to find all the ward data from there, including the ward names, but I couldn’t work out how to do this – the program visits each ward in turn and I had to type in the ward names. The OpenElectionData site seemed to be a bit wobbly too, so I encoded party information into my program rather the pulling it from their site. Better fitting of the ward labels into the wards would have been nice too (although this is a hard problem). Obviously there’s a wide range of analysis that can be carried out on the underlying electoral data.

Footnotes

The python code to do this analysis is here. You will need to install the rdflib and PyShp libraries and download the OS Boundary Line data. I used the Python(x,y) distribution but I think it’s just the matlibplot library which is required. The CWac.py program extracts the results from the website and writes them to a CSV file, the Mapping.py program makes a map from them. You will need to adjust file paths to suit your installation.