Tag: programming

Book review: Masterminds of Programming by Federico Biancuzzi and Shane Warden

mastermindsThe next review in my work related books thread is of Masterminds of Programming by Federico Biancuzzi and Shane Warden. The subtitle, Conversations with the creators of major programming languages, is a good summary of the contents. The book is an edited transcript of interviews with the creators of major programming languages.

Frequently the conversation is with a single person but in a couple of cases two or even three people are interviewed. This is one small failing of the book because particularly where they interview three people the resulting chapter is very long and a bit repetitive.

There is a valid question to ask as to whether languages can be so closely tied to single individuals, and in the afterword the authors touch on this saying that one of the recurring themes was that the people they interviewed succeeded because they surrounded themselves with brilliant people. Some of these languages started off as one-man exercises but most grow from collective academic or corporate efforts, and even the one-man band languages developed a fairly formal community.

The languages covered are object-oriented languages (C++, Objective-C, Java, C# and Eiffel), functional languages (ML, Haskell, APL), glue languages originally designed to occupy the space between Unix and C (Awk, Lua, Python, Perl), languages designed for embedding (Forth and PostScript). Dartmouth Basic, SQL and UML are basically their own things. To be honest UML does not really fit into the book in my view since it is a formal design description language rather than an executable programming language. The languages were created between 1964 (Dartmouth BASIC) or maybe 1957 (for APL) and 2000 (C#).

I was sad to note the absence of a chapter on FORTRAN but John Backus, the inventor of FORTRAN died in 2007 – a couple of years before the book was written. Also there are no women interviewed in this book, a quick search reveals there are a number of programming languages invented or co-invented by women. COBOL, invented by Grace Hopper would be a prime candidate here but she died in 1992. Small Talk which inspired a lot of object-oriented languages was co-invented by Adele Goldberg and LOGO co-invented by Cynthia Solomon would fit rather well into the book.

The interviewers clearly had a set of questions which they asked each interviewee and the varying results indicate which questions chimed with the interests of the interviewee. The topics included concurrency, how to manage feature requests for languages, working in teams, debugging, software engineering and teaching.

The authors of the object-oriented languages (C++, Objective-C, Java, C# and Eiffel) are somewhat at each others throats. However, they are all pretty clear that object-orientation is the way forward for large software projects although they see encapsulation rather than inheritance or reuse as the key benefits it brings. There is a degree of condescension towards those languages that they perceived to have been successful as a result of marketing.

The authors of the functional programming languages are more interested in formal specification, I feel I should learn more about type theory. I have looked at Haskell in the past, and found it a bit challenging, however ideas from Haskell and other functional programming languages have made it into Python, my preferred language. The chapter on APL was entertaining, it was conceived and developed as a coherent formalised system for describing algebra the authors did not touch a computer for a number of years after “development” started in the mid-fifties. It is written as symbols which are challenging to enter on a conventional keyboard, you can see it in action here.

Tedd Codd’s relational database design was core to the success of SQL, and is largely why it has not been replaced. SQL was designed alongside IBM’s System R but Oracle produced the first commercial SQL engine.

I learnt a few random facts from the book which I can’t write as a coherent story:

  • Charles H. Moore – author of Forth: “Legacy code may be the downfall of our civilisation”;
  • Awk is an acronym of its authors names, Alfred Aho, Peter Weinberger and Brian Kernighan;
  • Tom Love – co-author of Objective-C – “100,000 lines of code fills a box of paper and requires 2 people to maintain it. Test cases should be another two or three boxes.”!

The book would have really benefited from some sample code in each language, perhaps in the manner of Exercises in Programming Style which implements the same algorithm in different programming styles. I picked up Beautiful Code by Andy Oram and Greg Wilson, interviews with programmers, for my reading list. As well as The Mythical Man-month by Frederick P. Brooks, Jr which I probably should have read years ago.

I found this book really interesting, in part as a way of understanding how the programming languages I use every day of my working life came into being but also to understand the mindset of highly skilled programmers.

Versioning in Python

I have recently been thinking about versioning in Python, both of Python and also of the Python packages. This is a record of how it is done for a current project and the reasoning behind it.

Python Versioning

At the beginning of the project we made a conscious decision to use Python 3.9, however our package is also used by our Airflow code which does integration tests, and provides reference Docker images based on Python 3.7 (their strategy is to use the oldest version of Python still in support). This approach is documented here. And the end of life dates for recent Python versions are listed here:

Since we started the project, Python 3.11 has been released so it makes sense to extend our testing from just Python 3.9 to include Python 3.7 and 3.11 too.

The project uses an Azure Pipeline to run continuous integration / continuous development tests, it is easy to add tests for multiple versions of Python using the following stanza in the configuration file for the pipeline.

Extending testing resulted in only a small number of minor issues, typically around Python version support for dependencies which were easily addressed by allowing more flexible versions in Python’s requirements.txt rather than pinning to a specific version. We needed to address one failing test where it appears Python 3.11 handles escaping of characters in Windows-like path strings differently from Python 3.9.

Package Versioning

Our project publishes a package to a private PyPi repository. This process fails if we attempt to publish the same version of the package twice, where the version is that specified in the “pyproject.toml”* configuration file rather than the state of the code.

Python has views on package version numbering which are described in PEP-440, this describes permitted formats. It is flexible enough to allow both Calendar Versioning (CalVer – https://calver.org/) or Semantic Versioning (SemVer – https://semver.org/) but does not prescribe how the versioning process should be managed or which of these schemes should be used.

I settled on Calendar Versioning with the format YYYY.MM.Micro. This is a considered personal taste. I like to know at a glance how old a package is, and I worry about spending time working out whether I need to bump major, minor or patch parts of a semantic version number whilst with Calendar Versioning I just need to look at the date! I use .Micro rather than .DD (meaning Day) because the day to be used is ambiguous in my mind i.e. is the day when we open a pull request to make a release or when it is merged?

It is possible to automate the versioning numbering process using a package such as bumpversion but this is complicated when working in a CI/CD environment since it requires the pipeline to make a git commit to update the version.

My approach is to use a pull request template to prompt me to update the version in pyproject.toml since this where I have stored version information to date, as noted below I moved project metadata from setup.cfg to pyproject.toml as recommended by PEP-621 during the writing of this blog post. The package version can be obtained programmatically using the importlib.metadata.version method introduced in Python 3.8. In the past projects defined __version__ programmatically but this is optional and is likely to fall out of favour since the version defined in setup.cfg/pyproject.toml is compulsory.

Should you wish to use Semantic Versioning then there are libraries that can help with this, as long as you following commit format conventions such as those promoted by the Angular project.

Once again I am struck on how this type of activity is a moving target – PEP-621 was only adopted in November 2020.

* Actually when this blog post was started version information and other project metadata were stored in setup.cfg but PEP-621 recommends it is put in pyproject.toml and is preferred by the packaging library. Setuptools has parallel instructions for using pyproject.toml or setup.cfg, although some elements to do with package and data discovery are in beta.

Book review: Exercises in Programming Style by Cristina Videira Lopes

exercises_in_programming_styleRecently our CIO has allowed us to claim one technical book per quarter on expenses as part of our continuing professional development. Needless to say since I was buying these books already I leapt at the opportunity! The first fruit of this policy is Exercises in Programming Style by Cristina Videira Lopes.

The book is modelled on Raymond Queneau’s book Exercises in Style which writes the same story in 99 different ways.

Exercises in Programming Style takes a simple exercise: counting the frequency of words in a file and reporting the top 25 words, and writes a program to do this in forty different styles spread across 10 sections.

The sections are historical, basic styles, function composition, objects and object interaction, reflection and metaprogramming, adversity, data-centric, concurrency, interactivity, and neural networks. The section on neural networks breaks the pattern with example programmes only handling small elements of the word frequency problem. The sections vary in size, the objects and object interaction is the largest.

Lopes talks about styles in terms of constraints, for example in the "Good old times" historical style there are no named variables and limited memory, in the "Letterbox" style objects pass messages to one another to prompt actions.

The shortest implementation of the example is in the "Code Golf" chapter with just six lines, other examples run to a couple of pages – a hundred lines or so. Lopes is somewhat opinionated as to style but quite balanced providing reasoning where unusual styles may be appropriate. This was most striking for me in the section on "Adversity" which discussed error-handling. Lopes suggests that a "Passive Aggressive" style with error handling all occurring at the top level in a try-except block is better than my error handling to date which has been more in the "Constructivist" (trapping errors but proceeding with defaults) or "Tantrum"(catching errors and refusing to proceed) style.

Sometimes the fit to the style format feels slightly forced, in particular in the chapters relating to neural networks but in the Data-Centric chapter I learnt how to implement spreadsheet-like functionality in Python which is interesting.

I’ve been programming for about 40 years but as a physical scientist analysing data or trying out numerical models rather than a professional developer. Exercises  brings together many bits and pieces of things I’ve learnt, often in the context of different languages. For a while I’ve had the feeling that I didn’t need to learn new languages, I needed to learn how to apply new techniques in my favoured language (Python) and this book does exactly that.

Once again I was bemused to see Python’s "gentleman’s agreement" methodology over certain matters. By convention methods of a class whose name start with an underscore are considered private but this isn’t enforced so if you really want to use a "private" method just go ahead. Similarly many object-oriented languages support a "this" keyword for the members of a class to refer to themselves. Python uses "self" but only by convention, you can specify "self" is "me" or whatever other name you please. The style format provides a nice way of demonstrating a feature of Python in a non-trivial but minimal functioning manner.

It is somewhat chastening to discover that many of the styles in this book had their abstract origins in the 1960s, shortly before I was born, entered experimental languages such as Smalltalk in the seventies where I would have read about them in computer magazines and became mainstream in the eighties and nineties in languages like C++, Java and Python, not long after the start of my programming career. Essentially, most of the action in this book has taken place during my lifetime! In physics we are used to the figures in our eponymous laws (Newton, Maxwell etc) being very long dead. In computing the same does not apply.

What I take away from Exercises is that to a fair degree modern programming languages can be used to implement a wide range of the ideas generated in computer science over the last 50 or so years so in improving your skill as a programmer learning new languages is not the highest priority. There is a benefit to learning new techniques in a language in which your are familiar. Clearly some languages are designed heavily to support a certain style, for example Haskell and functional programming but I found it easier to understand monads explained in the context of Python than in Haskell where everything was alien.

Exercises is surprisingly readable, the programs are well-documented and Lopes’ text is short but clear with references to further reading. It stands alongside Seven databases in Seven Weeks by Eric Redmond and Jim R. Wilson as a book that I will rave about and recommend to everyone!

Book Review: Scala for the Impatient by Cay S. Horstmann

scala_for_impatientI thought I should learn a new language, and Scala seemed like a good choice so I got Scala for the Impatient by Cay S. Horstmann.

Scala is a functional programming language which supports object orientation too. I’m attracted to it for a number of reasons. Firstly, I’m using or considering using a number of technologies which are based on Java – such as Elasticsearch, Neo4j and Spark. Although there are bindings to my favoured language, Python, for Spark in particular I feel a second class citizen. Scala, running as it does on the Java Virtual Machine, allows you to import Java functions easily and so gives better access to these systems.

I’m also attracted to Scala because it is rather less verbose than Java. It feels like some of the core aspects of the language ecosystem (like the dependency manager and testing frameworks) have matured rapidly although the range of available libraries is smaller than that of older languages.

Scala for the Impatient gets on with providing details of the language without much preamble. Its working assumption is that you’re somewhat familiar with Java and so concepts are explained relative to Java. I felt like it also made an assumption that you knew about the broad features of the language, since it made some use of forward referencing – where features are used in an example before being explained somewhat later in the book.

I must admit programming in Scala is a bit of a culture shock after Python. Partly because its compiled rather than interpreted, although the environment does what it can to elide this difference – Scala has an REPL (read-evaluate-print-loop) which works in the background by doing a quick compile. This allows you to play around with the language very easily. The second difference is static typing – Scala is a friendly statically typed language in the sense that if you initialise something with a string value then it doesn’t force you to tell it you want this to be a string. But everything does have a very definite type. It follows the modern hipster style of putting the type after the symbol name (i.e var somevariablename: Int = 5 ), as in Go rather than before, as in earlier languages (i.e int somevariablename = 5).

You have to declare new variables as either var or val. Variables (var) are mutable and values (val) are immutable. It strikes me that static typing and this feature should fix half of my programming errors which in a dynamically typed language are usually mis-spelling variable names, changing something as a side effect and putting the wrong type of thing into a variable – usually during I/O.

The book starts with chapters on basic control structures and data types, to classes and objects and collection data types. There are odd chapters on file handling and regular expressions, and also on XML processing which is built into the language, although it does not implement the popular xpath query language for XML. There is also a chapter on the parsing of formal grammars.

I found the chapter on futures and promises fascinating, these are relatively new ways to handle concurrency and parallelism which I hadn’t been exposed to before, I notice they have recently been introduced to Python.

Chapters on type parameters, advanced types and implicit types had me mostly confused although the early parts were straightforward enough. I’d heard of templating classes and data strctures but as someone programming mainly in a dynamically typed languages I hadn’t any call for them. I turns out templating is a whole lot more complicated than I realised!

My favourite chapter was the one on collections – perhaps because I’m a data scientists, and collections are where I put my data. Scala has a rich collection of collections and methods operating on collections. It avoids the abomination of the Python “dictionary” whose members are not ordered, as you might expect. Scala calls such a data structure a HashMap.

It remains to be seen whether reading, once again, chapters on object-oriented programming will result in me writing object-oriented programs. It hasn’t done in the past.

Scala for the Impatient doesn’t really cover the mechanics of installing Scala on your system or the development environment you might use but then such information tends to go stale fast and depends on platform. I will likely write a post on this, since installing Scala and its build tool, sbt, behind a corporate proxy was a small adventure.

Googling for further help I found myself at the Scala Cookbook by Alvin Alexander quite frequently. The definitive reference book for Scala is Programming in Scala by Martin Odersky, Lex Spoon and Bill Venners. Resorting to my now familiar technique of searching the acknowledgements for women working in the area, I found Susan Potter whose website is here.

Scala for the Impatient is well-named, it whistles through the language at a brisk pace, assuming you know how to program. It highlights the differences with Java, and provides you with the vocabulary to find out more.

Book review: BDD in Action by John Ferguson Smart

bddinactionBack to technical reading with this book BDD in Action by John Ferguson Smart. BDD stands for Behaviour Driven Development, a relatively new technique for specifying software requirements.

Behaviour Driven Development is an evolution of the Agile software development methodology which has project managers writing “stories” to describe features, and sees developers writing automated tests to guide the writing of code – this part is called “test driven development”. In behaviour driven development the project manager, along with their colleagues who may be business analysts, testers and developers, write structured, but still “natural language”, acceptance criteria which are translated into tests that are executed automatically.

Behaviour Driven Development was invented by Dan North whilst at Thoughtworks in London, there he wrote the first BDD test framework, JBehave and defined the language of the tests, called Gherkin. Gherkin looks like this:

Scenario: Register for online banking

Given that bill wants to register for online banking

When he submits his application online

Then his application should be created in a pending state

And he should be sent a PDF contract to sign by email

The scenario describes the feature that we are trying to implement, and the Given-When-Then steps describe the test, Given is the setup, When is an action and Then is the expected outcome. The developer writes so called “step definitions” which map to these steps and the BDD test framework arranges the running of the tests and the collection of results. There is a bit more to Gherkin than the snippet above encompasses, it can provide named variables and values, and even tables of values and outputs to be fed to the tests.

Subsequently BDD frameworks have been written for other languages, such as Lettuce for Python, SpecFlow for .NET and Cucumber for Ruby. There are higher level tools such as Thucydides and Cucumber Reports. These tools can be used to generate so-called “Living Documentation” where the documentation is guaranteed to describe the developed application because it describes the tests around which the application was built. Of course it is possible to write poorly considered tests and thus poor living documentation but the alternative is writing documentation completely divorced from code.

Reading the paragraph above I can see that for non-developers the choice of names may seem a bit whacky but that’s a foible of developers. I still have no idea how to pronounce Thucydides and my spelling of it is erratic.

BDD in Action describes all of this process including the non-technical parts of writing the test scenarios, and the execution of those scenarios using appropriate tools. It takes care to present examples across the range of languages and BDD frameworks. This is quite useful since it exposes some of how the different languages work and also shows the various dialects of Gherkin. BDD in Action also covers processes such as continuous integration and integration testing using Selenium.

As someone currently more on the developer side of the fence, rather than the (non-coding) project manager BDD seems to add additional layers of complexity since now I need a library to link my BDD style tests to actual code, and whilst I’m at it I may also include a test-runner library and a library for writing unit tests in BDD style (such as spock).

I’ve had some experience of managing Agile development and with that hat on BDD feels more promising, in principle I can now capture capabilities and feature requirements with my stakeholders in a language that my developers can run as code. Ideally BDD makes the project manager and stakeholders discuss the requirements in the form of explicit examples which the developers will code against.

BDD in Action has reminded why I haven’t spent much time using Java: everything is buried deep in directories, there are curly brackets everywhere and lots of boilerplate!

I suspect I won’t be using BDD in my current work but I’ll keep it in the back of my mind for when the need arises. Even without the tooling it is a different way of talking to stakeholders about requirements. From a technical point of view I’m thinking of switching my test naming conventions to methods like test_that_this_function_does_something arranged in classes named like WhenIWantToDoThisThing, as proposed in the text.

In keeping with my newfound sensitivity to the lack of women in technical writing, I scanned the acknowledgements for women and found Liz Keogh – who is also mentioned a number of times in the text as an experienced practioner of BDD. You can find Liz Keogh here. I did look for books on BDD written by women but I could find none.

If you want to know what Behaviour Driven Design is about, and you want to get a feel for how it looks technically in practice (without a firm commitment to any development language or libraries) then BDD in Action is a good place to start.