Aug 20 2016

Book review: Test-Driven Development with Python by Harry J.W. Percival

test-drivenTest-Driven Development with Python by Harry J.W. Percival is a tutorial rather than a text book and it taught me as much about Django as testing. I should point out that I wilfully fail to do the “follow along with me” thing in tutorial style books.

Test-driven development (TDD) is a methodology that mandates writing tests first before actual code that does stuff. The first tests are for the desired behaviour that will be presented to the user.

I was introduced to software testing very early in my tenure ScraperWiki, now The Sensible Code Company. I was aware of its existence prior to this but didn’t really get the required impetuous to get me started, it didn’t help that I was mostly coding in Matlab which didn’t have a great deal of support for testing at the time. The required impetus at ScraperWiki was pair programming.

Python is different to Matlab, it has an entirely acceptable testing framework built-in. Test-driven Development made me look at this functionality again. So far I’ve been using the nose testing library but there is a note on its home page now saying it will not be developed further. It turns out Python’s unittest has been developing in Python 3 which reduces the need for 3rd party libraries to help in the testing process. Python now includes the Mock library which provides functions to act as “test doubles” prior to the implementation of the real thing. As an aside I learnt there is a whole set of such tests doubles including mocks, but also stubs, fakes and spies.

Test-driven Development is structured as a tutorial to build a simple list management web application which stores the lists of multiple users, and allows them to add new items to the lists. The workflow follows the TDD scheme: to write failing tests first which development then allows to pass. The first tests are functional tests  of the whole application made using the Selenium webdriver, which automates a web browser, and allows testing of dynamic, JavaScript pages as well as simple static pages. Beneath these functional tests lie unit tests which test isolated pieces of logic and integrated tests which test logic against data sources and other external systems. Integration tests  test against 3rd party services.

The solution is worked through using the Django web framework for Python. I’ve not used it before – I use the rather simpler Flask library. I can see that Django contains more functionality but it is at the cost of more complexity. In places it wasn’t clear whether the book was talking about general testing functionality or some feature of the Django testing functionality. Django includes a range of fancy features alien to the seasoned Flask user. These include its own ORM, user administration systems, and classes to represent web forms.

Test-driven Development has good coverage in terms of the end goal of producing a web application. So not only do we learn about testing elements of the Python web application but also something of testing in JavaScript. (This seems to involve a preferred testing framework for every library). It goes on to talk about some elements of devops, configuring servers using the Fabric library, and also continuous integration testing using Jenkins. These are all described in sufficient detail that I feel I could setup the minimal system to try them out.

Devops still seems to be something of a dark art with a provision of libraries and systems (Chef, Puppet, Ansible, Juju, Salt, etc etc) with no clear, stable frontrunner.

An appendix introduces "behaviour-driven development” which sees sees a test framework which allows the tests to be presented in terms of a domain specific language with (manual) links to the functional tests beneath.

In terms of what I will do differently having read this book. I’m keen to try out some JavaScript testing since my normal development activities involve data analysis and processing using Python but increasingly blingy web interfaces for visualisation and presentation. At the moment these frontends are slightly scary systems which I fear to adjust since they are without tests.

With the proviso above, that I don’t actually follow along, I like the tutorial style. Documentation has its place but ultimately it is good to be guided in what you should do rather than all the things you could possibly do. Test-driven Development introduces the tools and vocabulary you need to work in a test-driven style with the thread of the list management web application tying everything together. Whether it instils in me the strict discipline of always writing tests first remains to be seen.

Aug 17 2016

Book review: The Book of the Edwardian & Interwar House by Richard Russell Lawrence

edwardiandI’m currently working on providing some data for domestic properties, mainly for the purpose of making the process of getting a buildings insurance quote easier. One of the parameters the insurance industry is interested in is the age of a home.

And so I came to The Book of the Edwardian & Interwar House by Richard Russell Lawrence. I picked the book up partly out of curiosity but I also hoped to pick up some ideas as to how I might date a house based on the information to hand.

The book starts with some general comments about the period and what had gone before, leading to a discussion of Edwardian architecture. This is followed by a similar discussion of interwar architecture. The book finishes with a whole load of short chapters on individual elements of the home, bricks, tiles, lighting, wireless and telephone and so forth. As well as simple domestic architecture there is some discussion of high end homes of the period.

The second half of the 19th century saw the expansion of British cities, driven by industrialisation and enabled by the growing railway system and, for the capital, the London Underground. This led to the building of an awful lot of terraced houses at high densities, generally to be rented to workers. The 1877 Model Bye-laws Act and the 1878 Building Act set some requirements on how houses could be built in terms of their size, distance from facing houses and sanitary facilities.

This situation continued into the beginning of the 20th century, with a growing middle class looking for better homes than the terraces offered. The First World War brought house building to a complete stop, after the war there was a housing shortage of something like 850000 properties and a fear in government that there would be fighting on the streets if “Homes Fit for Heroes” were not supplied. The Interwar period saw a huge increase in home ownership, the building of 4 million homes (the majority semi-detached) and the first council houses. Public housing was built following the specifications of the Tudor Walters Report (1918), which specified a minimum size of 760 square feet, a maximum density of 12 houses per acre and preferred wider houses, semi-detached or in short terraces. Private housing sought to differentiate in stall from public housing but could scarcely offer poorer specifications.

This is interesting because sizes and spacings of buildings can be determined from the Ordnance Survey’s mapping data.

Earlier regulations, following the Great Fire of London, had banned the building of timber-framed houses in cities and windows had to be recessed in their openings for similar reasons. This, and details such as how bricks are laid, can give further information on building age but they are not readily amenable to automation or determinable from public data.

Gross style seems to be of relatively little help when dating buildings, many Edwardian and Interwar houses were built in neo-Georgian style which as the name implies can look very Georgian. Also popular was Tudorbethan which emulated an old English style with mock, black wooden beams painted or nailed to a white exterior. Chester’s city centre is rife with an elaborate form of this style, mainly built in the Edwardian or very late Victorian period, although there are some examples of the genuine article.

Internally, the period saw the evolution of the kitchen, scullery and kitchenette as new-fangled gas and electric ovens replaced old ranges. There was also a discussion as to whether buildings for workers should have a separate parlour and living room. I’m well aware that my grandparents generation would often reserve a room for “best”, which as family did not get invited into.

It struck me as I read The Book that houses I would have been fairly confident were post-Second World War I now suspect are interwar. It surprised me that modest houses started to get a garage as an option as early as the 1930s, the big increase in car ownership had started before the First World War, a bit earlier than I expected.

I learnt some new useful vocabulary, a “catslide” roof is one on a two storey house which terminates at the top of the first storey. A “hipped” is one that has slopes on more that two sides, rather than having gable ends (previously I’d have called this a “roof”).

This is something of a coffee-table book, with lots of photographs, I found the text in the early part more readable than the long litany of descriptions of individual architectural details. I have a few ideas to try out on the dating of houses.

Aug 05 2016

Book review: The Company by John Micklethwait and Adrian Wooldridge

the_companyThe Company: A Short History of a Revolutionary Idea by John Micklethwait and Adrian Wooldridge is indirectly related to my reading of the history of science and technology. The commercial world impinges on stories such as that of the development of the railways, the story of Mauve and Lord Kelvin’s work on telegraphy. Scientific expeditions were motivated, in part, and supported logistically by merchant interests. The “search for the longitude” was driven by needs of merchants and navies. 

The Company is particularly concerned with the limited liability joint-stock company, created in approximately its current form by the 1862 Companies Act in the UK but repeated across the world. In this system, the owners only risk the money they put into a venture (rather than all their worldly possessions), and shares in that company can be traded on the stockmarket.

The book starts with some prehistory, there is some evidence in Mesopotamia as back as far as 3000 BC for arrangements which went beyond simple barter. And by Roman times there were various forms of company used for collecting tax, for example. The Romans also had a developed legal system. These arrangements tended to be relatively short lived and in the form of partnerships, where the owners were the managers and if things went wrong they could lose the toga off their backs.

The Middle Ages saw the rise of “corporate persons” in Western Europe which included guilds, monasteries and corporations. The Aberdeen Harbour Board, founded in 1136 can lay claim to be the oldest still existing. In China there was a tendency towards large, long-lived state corporations. Italy had compagnie literally “breaking bread together” with shared total liability, and thus requiring high levels of trust. Banchi were the campagnies banking equivalents, often built around family ties. The relationship between banks and companies is a theme throughout the book.

England and France were scarred, separately, by the collapse of the South Sea Company and the Mississippi Company in the early 18th century. Created in the spirit of the East India Companies, monopoly corporations with a Royal Charter, they came to grief through rampant speculation and dubious government decisions on debt.

The railways provided the impetuous for the reform of company law. They required substantial investment, once formed they needed substantial workforces and management structures. In the early days each railway company had to make its case in parliament to gain its charter, this was a costly, slow undertaking. And so there were reforms culminating in the 1862 act.

A quote attributed to Edward Turlow in 1844: “Corporations have neither bodies to be punished, nor souls to be condemned, they therefore do as they like”, encapsulates an ongoing concern about companies. This from a lawmaker but a similar concern was echoed by economists of the time, they thought that a partner or manager/owner was going to do better job of running a company than a servant of shareholders.

The Company compares developments in the US, UK, Germany and Japan in the latter half of the 19th century. The US forte was in professionalising management, enabling them to build ever bigger companies. The UK had a more developed stockmarket but positively spurned professional management (a blight that has afflicted it since then). Germany had strong oversight boards from the beginning, complementing the management board. These included workforce representation, and companies were seen as a social enterprise.

This is the second tension in the company: whether it is solely interested in its shareholders or whether it is responsible to a wider group of stakeholders which might include its employees, the local community, environment and so forth.

The book continues with developments in the latter half of the 20th century, closing in 2002. These include the wave of privatisations across Europe (the French keeping substantial government control), and the decline of many big companies. In the case of Enron and WorldCom these were precipitous declines in disgrace but there were also leveraged buyouts and subsequent dismantlings. Another theme here is the balance between transaction costs and hierarchy costs. Companies succeed when the cost of running a hierarchy is lower than the savings made by carrying out transactions in aggregate. These costs and savings change over time, transaction costs have recently been much reduced by technology.

I was surprised by the recency of the commercial world we see today. The modern company and the stockmarket only really came into being in the final quarter of the 19th century with the great corporations rising to dominance in the first quarter of the 20th century. Harvard’s business school was only founded at the beginning of the 20th century. This is little more than a lifetime ago.

The same is true of trade tariffs, they were initially introduced by the US and Germany in the 1880s with Britain and the Netherlands holding out as “free-traders” until 1932. One wonders whether this is the origin of Anglo-Dutch conglomerates such as Unilever and Shell. The book is not explicit on the differences between tariffs and customs duties and taxes – a clarification which I would have found useful.    

The authors are clearly enthusiasts for capitalism in the form of the company. The book is short and readable, and I felt enlightened for having read it. The bibliography has some good pointers for further reading.

Jul 20 2016

Book review: The Seven Pillars of Statistical Wisdom by Stephen M. Stigler

sevenpillarsThe Seven Pillars of Statistical Wisdom by Stephen M. Stigler is a brief history of what the author describes as the key pillars of statistics. This is his own selection rather than some consensus of statistical opinion. That said, to my relatively untrained eye the quoted pillars are reasonable. They are as follows:

1 – Aggregation. The use of the arithmetic average or mean is not self-evidently a good thing. It was during the 17th century, when people were taking magnetic measurements in order to navigate, that ideas around the mean started to take hold. Before this time it was not obvious which value one should take when discussing a set of measurement purportedly measuring the same thing. One might take the mid-point of the range of values, or apply some subjective process based on your personal knowledge of the measurer. During the 17th century researchers came to the conclusion that the arithmetic mean was best.

2 – Information. Once you’ve discovered the mean, how good is it as a measure of the underlying phenomena as you increase the size of the aggregation? It seems obvious that the measure improves as the number of trials increases but how quickly? The non-trivial answer to this question is that it scales as the square root of N, the number of measurements. Sadly this means if you double the number of measurements you make, you only improve you confidence in the mean by a factor of a little over 1.4 (that being the square root of 2) . Mixed in here are ideas about the standard deviation, a now routine formulation quoted with the mean. It was originally introduced by De Moivre in 1738, for the binomial distribution, but then generalised by Laplace in 1810 as the Central Limit Theorem.

3 – Likelihood. This relates to estimating confidence that an observed difference is real, and not due to chance. The earliest work, by John Arbuthnot, related to observed sex ratios in births recorded in England and whether they could be observed by chance rather than through a “real” difference in the number of boys and girls born.

4 – Intercomparison. Frequently we wish to compare sets of measurements to see if one thing is significantly different from another. The Student t-test is an example of such a thing. Named for William Gosset, who took a sabbatical from his job at Guiness to work in Karl Pearson’s lab at UCL. As an employee Guiness did not want Gosset’s name to appear on a scientific paper (thus revealing their interest), so he wrote under the rather unimaginative pseudonym "Student".

5 – Regression. The chapter starts with Charles Darwin, and his disregard for higher mathematics. He professed a faith in measurement and “The Rule of Three”. This is the algebraic identity a/b = c/d which states that if you know any 3 of a, b, c and d you can calculate the 4th. This is true in a perfect world, but in practice we would acquire multiple sets of our three selected values and use regression to obtain a “best fit” for the fourth value. Also in this chapter is Galton’s work on regression to the mean in particularly how parents with extreme heights had children who were on average closer to the mean height. This is highly relevant to the study of evolution and the inheritance of characteristics.

6 – Design. The penultimate pillar is design. In the statistical sense this means the design of an experiment in terms of the numbers of trials, and the organisation of the trials. This starts with a discussion of calculating odds for the French lottery (founded in 1757) and providing up to 4% of the French budget in 1811. It then moves on to RA Fisher’s work at the Rothamsted Research Centre on randomisation in agricultural trials. My experience of experimental design, is that statisticians always want you to do more trials than you can afford, or have time for! 

7 – Residual. Plotting the residual left when you have made your best model and taken it from your data is a time honoured technique. Systematic patterns in the residuals can indicate your modern is wrong, that there are new as yet undiscovered phenomena to be discovered. I was impressed to discover in this chapter that Frank Weldon cast 12 dice some  315,672 times to try to determine if they were biased. Data collection can be an obsessive activity. This story from the early 20th century is not in common.

Seven Pillars is oddly pitched, it is rather technical for a general science audience. It is an entertainment, rather than a technical text. The individual chapters would have fitted quite neatly into The Values of Precision, which I have reviewed previously.

Jul 16 2016

A question for Theresa May

Why have you appointed a man as foreign secretary who has:

For ordinary working people a track record like this would send their CV to the bottom of the pile. Apparently things are different for a senior member of the Tory Party.

Older posts «