Tag Archive: programming

May 18 2017

Book review: BDD in Action by John Ferguson Smart

bddinactionBack to technical reading with this book BDD in Action by John Ferguson Smart. BDD stands for Behaviour Driven Development, a relatively new technique for specifying software requirements.

Behaviour Driven Development is an evolution of the Agile software development methodology which has project managers writing “stories” to describe features, and sees developers writing automated tests to guide the writing of code – this part is called “test driven development”. In behaviour driven development the project manager, along with their colleagues who may be business analysts, testers and developers, write structured, but still “natural language”, acceptance criteria which are translated into tests that are executed automatically.

Behaviour Driven Development was invented by Dan North whilst at Thoughtworks in London, there he wrote the first BDD test framework, JBehave and defined the language of the tests, called Gherkin. Gherkin looks like this:

Scenario: Register for online banking

Given that bill wants to register for online banking

When he submits his application online

Then his application should be created in a pending state

And he should be sent a PDF contract to sign by email

The scenario describes the feature that we are trying to implement, and the Given-When-Then steps describe the test, Given is the setup, When is an action and Then is the expected outcome. The developer writes so called “step definitions” which map to these steps and the BDD test framework arranges the running of the tests and the collection of results. There is a bit more to Gherkin than the snippet above encompasses, it can provide named variables and values, and even tables of values and outputs to be fed to the tests.

Subsequently BDD frameworks have been written for other languages, such as Lettuce for Python, SpecFlow for .NET and Cucumber for Ruby. There are higher level tools such as Thucydides and Cucumber Reports. These tools can be used to generate so-called “Living Documentation” where the documentation is guaranteed to describe the developed application because it describes the tests around which the application was built. Of course it is possible to write poorly considered tests and thus poor living documentation but the alternative is writing documentation completely divorced from code.

Reading the paragraph above I can see that for non-developers the choice of names may seem a bit whacky but that’s a foible of developers. I still have no idea how to pronounce Thucydides and my spelling of it is erratic.

BDD in Action describes all of this process including the non-technical parts of writing the test scenarios, and the execution of those scenarios using appropriate tools. It takes care to present examples across the range of languages and BDD frameworks. This is quite useful since it exposes some of how the different languages work and also shows the various dialects of Gherkin. BDD in Action also covers processes such as continuous integration and integration testing using Selenium.

As someone currently more on the developer side of the fence, rather than the (non-coding) project manager BDD seems to add additional layers of complexity since now I need a library to link my BDD style tests to actual code, and whilst I’m at it I may also include a test-runner library and a library for writing unit tests in BDD style (such as spock).

I’ve had some experience of managing Agile development and with that hat on BDD feels more promising, in principle I can now capture capabilities and feature requirements with my stakeholders in a language that my developers can run as code. Ideally BDD makes the project manager and stakeholders discuss the requirements in the form of explicit examples which the developers will code against. 

BDD in Action has reminded why I haven’t spent much time using Java: everything is buried deep in directories, there are curly brackets everywhere and lots of boilerplate!

I suspect I won’t be using BDD in my current work but I’ll keep it in the back of my mind for when the need arises. Even without the tooling it is a different way of talking to stakeholders about requirements. From a technical point of view I’m thinking of switching my test naming conventions to methods like test_that_this_function_does_something arranged in classes named like WhenIWantToDoThisThing, as proposed in the text.  

In keeping with my newfound sensitivity to the lack of women in technical writing, I scanned the acknowledgements for women and found Liz Keogh – who is also mentioned a number of times in the text as an experienced practioner of BDD. You can find Liz Keogh here. I did look for books on BDD written by women but I could find none.

If you want to know what Behaviour Driven Design is about, and you want to get a feel for how it looks technically in practice (without a firm commitment to any development language or libraries) then BDD in Action is a good place to start.

Mar 26 2017

Book review: Working effectively with legacy code by Michael C. Feathers

legacy_codeWorking effectively with legacy code by Michael C. Feathers is one of the programmer’s classic texts. I’d seen it lying around the office at ScraperWiki but hadn’t picked it up since I didn’t think I was working with legacy code. I returned to read it having found it at the top of the list of recommended programming books from Stackoverflow at dev-books. Reading the description I learnt that it’s more a book about testing than about legacy code. Feathers defines legacy code simply as code without tests, he is of the Agile school of software development for whom tests are central.

With this in mind I thought it would be a useful read for me to improve my own code with the application of better tests and perhaps incidentally picking up some object-oriented style, in which I am currently lacking.

Following the theme of my previous blog post on women authors I note that there are two women authors in the 30 books on the dev-books list. It’s interesting that a number of books in the style of Working Effectively explicitly reference women as project managers, or testers in the text, i.e part of the team – I take this as a recognition that there exists a problem which needs to be addressed and this is pretty much the least you can do. However, beyond the family, friends and publishing team the acknowledgements mention one women in a lengthy list.

The book starts with a general overview of the techniques it will introduce, including the tools used to address them. These come down to testing frameworks and the refactoring tools found in many IDEs. The examples in the book are typically written in C++ or Java. I particularly liked the introduction of the ideas of the “seam”, a place where behaviour can be changed without editing the code and the “enabling point” – the place where a change can be made at that seam. A seam may be a class that can be replaced by another one, or a value altered. In desperate cases (in C) the preprocessor can be used to invoke test-time changes in the executed code.

There are then a set of chapters that answer questions that a legacy code-ridden developer might have such as:

  • I can’t get this class into a test harness
  • How do I know that I’m not breaking anything?
  • I need to make a change. What methods should I test?

This makes the book easy to navigate, if not a bit inelegant. It seems to me that the book addresses two problems in getting suitably sized pieces of code into a test harness. One of these is breaking the code into suitable sized pieces by, for example, extracting methods. The second is gaining independence of the pieces of code such that they can be tested without building a huge infrastructure up to support them.

Although I’ve not done any serious programming in Java or C++ I felt I generally understood the examples presented. My favoured language is Python, and the problems I tackle tend to be more amenable to a functional style of programming. Despite this I think many of the methods described are highly relevant – particularly those describing how to break down monster functions. The book is highly pragmatic, it accepts that the world is not full of applications in which beautiful structure diagrams are replicated by beautiful code.

There are differences between these compiled object-oriented languages and Python though. C#, Java, and C++ all have a collection of keywords (like public, private, protected, static and final) which control who can see what methods exist on a class and whether they can be over-ridden or replaced. These features present challenges for bringing legacy code under test. Python, on the other hand, has a “gentleman’s agreement” that method names starting with an underscore are private, but that’s it, and there are no mechanisms to prevent you using these “private” functions! Similarly, pretty much any method in Python can be over-ridden by monkey-patching. That’s to say if you don’t like a function in an imported library you can simply overwrite it with your own version after you’ve imported the library. This is not necessarily a good thing. A second difference is that Python comes with a unit testing framework and a mocking library rather than them being functionality which is third-party added. Although to be fair, the mocking library in Python was originally third party.

I’ve often felt I should programme in a more object-oriented style but this book has made me reconsider. It’s quite clear that spaghetti code can be written in an object oriented language as well as any other. And I suspect the data processing for which I am normally coding fits very well with a functional style of coding. The ideas of single responsibility functions, and testing still fit well with more functional programming styles.

Working effectively is readable and pragmatic. I suspect the developer’s dirty secret is that actually we wrote the legacy code that we’re now trying to fix.

Jun 28 2013

Testing, testing…

testingtesting

This post was first published at ScraperWiki.

Data science is a distinct profession from software engineering. Data scientists may write a lot of computer code but the aim of their code is to answer questions about data. Sometimes they might want to expose the analysis software they have written to others in order they can answer questions for themselves, and this is where the pain starts. This is because writing code that only you will use and writing code someone else will use can be quite different.

ScraperWiki is a mixed environment, it contains people with a background in software engineering and those with a background in data analysis, like myself. Left to my own devices I will write code that simply does the analysis required. What it lacks is engineering. This might show up in its responses to the unexpected, its interactions with the user, its logical structure, or its reliability.

These shortcomings are addressed by good software engineering, an area of which I have theoretical knowledge but only sporadic implementation!

I was introduced to practical testing through pair programming: there were already tests in place for the code we were working on and we just ran them after each moderate chunk of code change. It was really easy. I was so excited by it that in the next session of pair programming, with someone else, it was me that suggested we added some tests!

My programming at ScraperWiki is typically in Python, for which there a number of useful testing tools. I typically work from Windows, using the Spyder IDE and I have a bash terminal window open to commit code to either BitBucket or Github. This second terminal turns out to be very handy for running tests.

Python has an internal testing mechanism called doctest which allows you to write tests into the top of a function in what looks like a comment. Typically these comprise a call to the function from a command prompt followed by the expected response. These tests are executed by running a command like:

 python -m doctest yourfile.py

This is OK, and it’s “batteries included” but I find the mechanism a bit ugly. When you’re doing anything more complicated than testing inputs and outputs for individual functions, you want to use a more flexible mechanism like nose tools, with specloud to beautify the test output. The Git-Bash terminal on Windows needs a little shim in the form of ansicon to take full advantage of specloud’s features. Once you’re suitably tooled up, passed tests are marked with a vibrant, satisfying green and the failed tests by a dismal, uncomfortable red.

My latest project, a module which automatically extracts tables from PDF files, has testing. It divides into two categories: testing the overall functionality – handy as I fiddle with structure – and tests for mathematically or logically complex functions. In this second area I’ve started writing the tests before the functions, this is because often this type of function has a simple enough description and test case but implementation is a bit tricky. You can see the tests I have written for one of these functions here.

Testing isn’t as disruptive to my workflow as I thought it would be. Typically I would be repeatedly running my code as I explored my analysis making changes to a core pilot script. Using testing I can use multiple pilot scripts each testing different parts of my code; I’m testing more of my code more often and I can undertake moderate changes to my code, safe in the knowledge that my tests will limit the chances of unintended consequences.

Jun 01 2012

Book Review: Visualize This by Nathan Yau

9780470944882 cover.inddThis book review is of Nathan Yau’s “Visualize This: The FlowingData Guide to Design, Visualization and Statistics”. It grows out of Yau’s blog: flowingdata.com, which I recommend, and also his experience in preparing graphics for The New York Times, amongst others.

The book is a run-through of pragmatic methods in visualisation, focusing on practical means of achieving ends rather more abstract design principles for data visualisation; if you want that then I recommend Tufte’s “The Visual Display of Quantitative Information”.

The book covers a bit of data scraping, extracting useful numerical data from disparate sources, as Yau comments this is the thing that takes the time in this type of activity. It also details methods for visualising time series data, proportions, geographic data and so forth.

The key tools involved are the R and Python programming languages; I already have these installed in the form of R Studio and Python(x,y), distributions which provide an environment that looks like the Matlab one with which I have long been familiar with but which sadly is somewhat expensive for a hobby programmer. Alongside this are the freely available Processing language and the Protovis Javascript library which are good for interactive, online visualisations, and the commercial packages Adobe Illustrator, for vector graphic editing, and Adobe Flash Builder for interactive web graphics. Again these are tools I find out of my range financially for my personal use although Inkscape seems to be a good substitute for Illustrator.

With no prior knowledge of Flash and no Flash Builder, I found the sections on Flash a bit bewildering. It also highlights how perhaps this will be a book very distinctively of its time, with Apple no longer supporting Flash on iPhone its quite possible that the language will die out. And I notice on visiting the Protovis website that this is no longer under development: the authors have moved on to D3.js, Openzoom which is also mentioned is no longer supported. Python has been around for sometime now and is the lightweight language of choice for many scientists, similarly R has been around for a while and is increasing in popularity.

You won’t learn to program from this book: if you can already program you’ll see that R is a nice language in which to quickly make a wide range of plots. If you can’t program then you may be surprised how few commands R requires to produce impressive results. As someone who is a beginner in R, the examples are a nice tour of what is possible and some little tricks, such as the fact that plot functions don’t take data frames as arguments: you need to extract arrays.

As well as programming the book also includes references to a range of data sources and online tools, for example colorbrewer2.org – a tool for selecting colour schemes, and links to the various mapping APIs.

Readers of this blog will know that I am an avid data scraper and visualiser myself, and in a sense this book is an overview of that way of working – in fact I see I referenced flowingdata in my attempts to colour in maps (here).

The big thing I learned from the book in terms of workflow is the application of a vector graphics package, such as Adobe Illustrator or, Inkscape, to tidy up basic graphics produced in R. This strikes me as a very good idea, I’ve spent many a frustrating hour trying to get charts looking just right in the programming or plotting language of my choice and now I discover that the professionals use a shortcut! A quick check shows that R exports to PDF, which Inkscape can read.

Stylistically the book is exceedingly chatty, including even the odd um and huh, which helps make it quick and easy read although is a little grating. Many of the examples are also available over on flowingdata.com, although I notice that some are only accessible for paid membership. You might want to see the book as a way of showing your appreciation for the blog in physical and monetary form.

Look out for better looking visualisations from me in the future!

Dec 01 2011

Case-sensitive

As a long time programmer there is a little thing I’d like to rant about: case-sensitivity.

For the uninitiated this is the thing that makes your program think that the variable called “MyVariable” is different from the variable called “myVariable” and the variable called “Myvariable”. The problem is that some computer languages have it and some computer languages don’t.

I grew up with BASIC and later FORTRAN, case-insensitive languages which do the natural thing and assume that capitalisation does not matter. Other languages (C#, Java, C, Matlab) are not so forgiving and insist that “a” and “A” refer to two completely different things. In real life this feels like a wilful act of obstinacy, the worst excesses of teenage pedantry, it is a user experience fail.

The origins of case-sensitivity lie in the origins of the language C in the early 1970s,  FORTRAN doesn’t have it because when it was invented, in the dawn of computing, teletype printers did not support lowercase – there was no space on the print head.  I still think of FORTRAN as a language written in ALL CAPS and so rather IMPERATIVE.

There is an argument for case-sensitivity from the point of view of compactness; mathematicians, even of my relatively lowly level will name their variables in equations with letters from the Roman and Greek alphabets, subscripts and superscripts. My father, an undergraduate mathematician, even went as far as Cyrillic alphabet. Sadly the print media, even New Scientist, do not support such typographically extravagance.

It’s even worse when your language is dynamically-typed, that’s to say it allows you to create variables willy-nilly as you write your program rather than statically-typed languages which demand you tell them explicitly of the introduction of new variables. In a statically typed language if you start with a variable called “MyVariable” and later introduce “Myvariable”, by a slip of the key, then the compiler will kick-off: complaining it has no knowledge of this interloper. A dynamically-typed language will accept this new introduction silently, giving it a default value and causing untold damage in subsequent calculations.

It’s not like case-sensitivity is used in any syntactically meaningful manner: to a computer there is no practical difference between “foo” and “Foo” – the standard placeholder function name, foo” and “Foo” to the computer are simply the label you have stuck to a box containing a thing. There are some human conventions, but they are just that – and as with any convention they are honoured as much in the breech as the observance. The compiler doesn’t care.

I must admit to a fondness of CamelCase: capitalising the initial letters of each word in a long variable name, I do it in my hashtags on twitter. In the old days of FORTRAN no such fripperies existed, not only were your variable names limited in case but also in length: you had 6 characters to work your magic.

This is to ignore the many and varied uses different uses that computer languages find for brackets: {}, (), [] and even <>.

Older posts «