Tag: Matlab

Of Matlab and Python

I’ve been a scientist and data analyst for nearly 25 years. Originally as an academic physicist, then as a research scientist in a large fast moving consumer goods company and now at a small technology company in Liverpool. In common to many scientists of my age I came to programming in the early eighties when a whole variety of home computers briefly flourished. My first formal training in programming was FORTRAN after which I have made my own way.

I came to Matlab in the late nineties, frustrated by the complexities of producing a smooth workflow with FORTRAN involving interaction, analysis and graphical output.

Matlab is widely used in academic circles and a number of industries because it provides a great deal of analytical power in a user-friendly environment. Its notation for handling matrix (array) calculations is slick. Its functionality is extended by a range of toolboxes, and there is a community of scientists sharing new functionality. It shares this feature set with systems such as IDL and PV-WAVE.

However, there are a number of issues with Matlab:

  • as a programming language it has the air of new things being botched onto a creaking frame. Support for unit testing is an afterthought, there is some integration of source control into the Matlab environment but it is with Source Safe. It doesn’t support namespaces. It doesn’t support common data structures such as dictionaries, lists and sets.
  • The toolbox ecosystem is heavily focused on scientific applications, generally in the physical sciences. So there is no support for natural language processing, for example, or building a web application based on the powerful analysis you can do elsewhere in the ecosystem;
  • the licensing is a nightmare. Once you’ve got core Matlab additional toolboxes containing really useful functionality (statistics, database connections, a “compiler”) are all at an additional cost. You can investigate pricing here. In my experience you often find yourself needing a toolbox for just a couple of functions. For an academic things are a bit rosier, universities get lower price licenses and the process by which this is achieved is opaque to end-users. As an industrial user, involved in the licensing process, it is as bad as line management and sticking needles in your eyes in the “not much fun thing to do” stakes;
  • running Matlab with network licenses means that your code may stop running part way through because you’ve made a call to a function to which you can’t currently get the license. It is difficult to describe the level of frustration and rage this brings. Now of course one answer is to buy individual licenses for all, or at least a significant surplus of network licenses. But tell that to the budget holder particularly when you wanted to run the analysis today. The alternative is to find one of the license holders of the required toolbox and discover if they are actually using it or whether they’ve gone off for a three hour meeting leaving Matlab open;
  • deployment to users who do not have Matlab is painful. They need to download a more than 500MB runtime, of exactly the right version and the likelihood is they will be installing it just for your code;

I started programming in Python at much the same time as I started on Matlab. At the time I scarcely used it for analysis but even then when I wanted to parse the HTML table of contents for Physical Review E, Python was the obvious choice. I have written scrapers in Matlab but it involved interfering with the Java underpinnings of the language.

Python has matured since my early use. It now has a really great system of libraries which can be installed pretty much trivially, they extend far beyond those offered by Matlab. And in my view they are of very good quality. Innovation like IPython notebooks take the Matlab interactive style of analysis and extend it to be natively web-based. If you want a great example of this, take a look at the examples provided by Matthew Russell for his book, Mining the Social Web.

Python is a modern language undergoing slow, considered improvement. That’s to say it doesn’t carry a legacy stretching back decades and changes are small, and directed towards providing a more consistent language. Its used by many software developers who provide a source of help, support and an impetus for an decent infrastructure.

Ubuntu users will find Python pre-installed. For Windows users, such as myself, there are a number of distributions which bundle up a whole bunch of libraries useful for scientists and sometimes an IDE. I like python(x,y). New libraries can generally be installed almost trivially using the pip package management system. I actually use Python in Ubuntu and Windows almost equally often. There are a small number of libraries which are a bit more tricky to install in Windows – experienced users turn to Christoph Gohlke’s fantastic collection of precompiled binaries.

In summary, Matlab brought much to data analysis for scientists but its time is past. An analysis environment built around Python brings wider functionality, a better coding infrastructure and freedom from licensing hell.

The Dorothy Hopkinson Memorial Solar Panel

This post is in memory of my paternal grandmother: Dorothy Hopkinson. This isn’t going to be a maudlin post: granny died about 18 months ago at a fair old age. I remember her for her cheery smile, ultra-competitive playing of cards and Scrabble (whilst simultaneously claiming to be unconcerned by the outcome), her white drop-handlebar bike which she rode into her sixties, spectacles at a jaunty angle as she slept: snoring in front of the TV, icecream made from evaporated milk in battered aluminium dishes and nettle soup. When she died she left some money which I used to buy a direct water heating solar panel which I have christened “The Dorothy Hopkinson Memorial Solar Panel”.

So to the panel: bought from Solartwin it cost about £3000, it comprises a single panel about 6ft by 4ft which is on our roof. Installation was in September 2008, took about four hours and is minimally invasive. The panel takes cold water as it is heading towards the hot water tank, circulates it around the panel (as long as it’s sunny) heating it as it goes, then feeds it back into the top of the hot water tank. Our roof isn’t ideally oriented, it faces west rather than south, so it catches the afternoon sun. There’s only two of us in the house, we use gas for heating, hot water and cooking – so the solar panel is replacing some of that water heating. Our gas bills are already pretty low (~£360 per year). At the same time that we had the solar panel installed we added to our loft insulation and also significantly improved the insulation on the hot water tank (previously it had a flimsy red jacket, worn off the shoulder). As a side note, I found the rolls of loft insulation from B&Q to be rather huggable ;-)

I have, of course, been recording my gas and electric readings every Saturday morning for the last couple of years. Okay, I appreciate most people don’t consider this “natural” but it isn’t hurting anyone and it does give me some nice numbers to play with. Obsessive data collection has a fine track record in science: Tycho Brahe for example spent years collecting data on planetary orbits, Nevil Maskelyne collected data for determining the longitude from the location of the moon, J.D. Bernal’s poor lab technician spent what must have been an incredible amount of time fishing balls out of a bag, recording their exact location as he went (that’s more data collection by proxy) (these are just the ones I can remember off the top of my head). Of course computers and electronics have meant that a lot of data collection can be automated but there’s still a place for a bit of tedious manual data collection in the modern lab (or home). Feel free to add to tales of heroic data collection in the comments.

I give the gas and electricity numbers to a little program I wrote in C# (a relatively new programming language), which plots them out. I used C# as a little exercise to get me using the language – normally I use Matlab (which is a language environment more suited to scientific programming).

Making graphs with numbers comes very naturally to me, and I find it very easy to read a graph. But I observe when we go walking that I’m much happier reading an OS map than my wife who prefers words in a guide book – so I’m not sure everyone reads a graph in the way I do. Perhaps you’d like to comment?

So in the graph above, time runs along the bottom axis from left to right, the gas used in a week runs in the vertical direction. The mountainous bits are the winters, when the central heating is switched on. The relatively flat bits in between are the summers. First thing that strikes you is that the amount of gas used for central heating in the winter is huge (about ten times more) compared to the amount used just to heat water, during the summer. The solar panel was installed in September 2008 (almost a year ago),  over the winter it probably had relatively little effect, although looking at the raw numbers gas usage over Winter 2008/09 was about 20% lower than Winter 2007/8 despite this year being a colder winter. This is probably due to the improved insulation installed at the same time, we went from about 10cm thickness to about 25cm thickness. In the summer the reduction in gas usage is pretty large – down by 55%. Through this summer you can see that the gas usage drops gently to a minimum around June, this is due to the increasing height of the sun in the sky and the lengthening day.

Was the panel worth it? Well, it is quite exciting seeing your water getting heated for “free”, and during the summer months (even a relatively poor summer) we used a lot less gas. In financial terms the payback time for us is very long, although with a larger household and larger hot water tank the pay-off time would be reasonable (i.e. <10 years). Personally, I think the financial argument is missing the point, if we only change our behaviour for short term financial benefit we will all steam, gently over the next 100 years or so.