Tag: technology

A Rosetta Stone for programming ecosystems

Rosetta Stone By © Hans Hillewaert, CC BY-SA 4.0, Link

The Rosetta Stone is a stone slab dating to 196BC on which is written the same decree in three different ancient Egyptian languages, it was key to deciphering these languages in the modern era.

It strikes me that learning a new programming language is not really an exercise in learning the syntax of a new language, for vast swathes of languages those things are very similar. For an experienced programmer the learning is in the ecosystem. What we need is a Rosetta Stone for software development in different languages that tells us which tools to use for different languages, or at least gives us a respectable starting point.

To my mind the ecosystem specific to a programming language includes the language specification and evolution process, compiler/interpreter options, package/dependency management, virtual environments, project layout, testing, static analysis and formatting tools, and documentation generation. Package management, virtual environments and project layout are inter-related, certainly in Python (my primary programming language).

In researching these tools I was curious about their history. Compilers have been around since nearly the beginning of electronic computing in the late forties and early fifties. Modern testing frameworks generally derive from SmallTalk’s sUnit – published in 1989. Testing clearly went on prior to this – I am sure it is referenced in The Mythical Man Month and Grace Hopper is cited for her work in testing components and computers.

I tend to see package/dependency management as being the system by which I install packages from an internet repository such as Python’s PyPI repository – in which case the first of these was CPAN, for the Perl language first online in 1995, not long after the birth of the World Wide Web.

Separate linters date back, to 1978. Indent was the first formatter, written in 1976. The first documentation generation tools arose towards the end of the eighties (link) with JavaDoc which I suspect inspired many subsequent implementations appearing in the mid-nineties.

Tool choices are not as straightforward as they seem, in nearly all cases there are multiple options as a result of an evolution in the way programming is done more generally, or developers seeking to improve what they see as pain points in current implementations.  Some elements are down to personal choice.

For my first two Rosetta Stone blog posts I look at Python and TypeScript. My aim is that the blog post will discuss the options and a GitHub repository will demonstrate one set of options in action. I am guided by my experience of working in a team on a Python project where we needed to agree a tool set and best practices. The use of development pipelines which run linters, formatters and tests automatically before code changes are merged, drove a lot of this work. The aim of these blog posts is, therefore, not to simply get an example of a programming language running but to create a project that software developers would be content to work with. The code itself is minimal, although I may add some more involved code in future.

I wrote the TypeScript in my “initial release” to see how the process would work for a language with which I was not familiar – it helped me understand the Python ecosystem better and gave me “feature envy”!

I found myself referencing numerous separate blog posts in writing these first two blog posts which suggests this Rosetta Stone is a worthwhile exercise. I also found my search results were not great, contaminated by a great deal of poorly written perhaps automatically generated material.

There are other, generic, parts of the ecosystem such as the operating system on which the code will run, the source control system and the Integrated Development Environment the developer uses which I will not generally discuss. I work almost exclusively on Windows but I prefer Git Bash as my shell. I use git with GitHub for source control and Visual Code as my editor/IDE.

When I started this exercise I thought that that there may be specific Integrated Development Environments used for specific languages. In the eighties and nineties when you bought a programming language the Integrated Development Environment was often part of the deal. This seems not to be the case anymore, most IDEs these days can be extended with plugins specific to a language so which IDE you start with is immaterial. In any case any language can be used as a combination of a text editor and command line tools.  

I have been programming since I was a child in the early eighties.  First in BASIC, then at university in FORTRAN, in industry in MATLAB before moving to Python. During that time I have also dabbled in C++ and Java but largely theoretical point of view. Although I have been programming for a long time it has generally been in the role of scientist / data scientist producing code for my own use, only in the last few years have I written code intended to be consumed by others.

These are my first two “Rosetta Stone” blog posts:

Book review: Broad Band by Claire L. Evans

Broad Band by Claire L. Evans book cover. Cream background with a silhouette of a woman made from circuit boards

This review is of Broad Band by Claire L. Evans, subtitled The Untold Story of the Women Who Made the Internet. It is arranged thematically with each chapter focusing on a couple of women moving in time from the first chapter, about Ada Lovelace in the 19th century, through to the early years of the 21st century. The first part of the book covers the early period of computing up to the mid-sixties, the second part the growth of networked computing through the seventies and eighties with the final part covering the rise of the World Wide Web and services devoted to women.

The first chapter introduces us to Ada Lovelace, sometimes heralded as the first programmer which is a somewhat disputable claim. More importantly she was clearly a competent mathematician and excelled in democratising and explaining the potential of the mechanical computing engines that Charles Babbage was trying, and largely failing, to build. More broadly this chapter covers the work of the early human “computers”, who were often women, employed to carry out calculations for astronomical or military applications. Following on from this role, by 1946 250,000 women were working in telephone exchanges (presumably in the US).

Women gained this role as “computers” for a range of reasons. In the 19th century it was seen as acceptable work for educated women whose options were severely limited – as they would be for many years to come, excepting war time. The lack of alternatives meant they were very cheap to employ. Under the cover of this apparently administrative role of “computer” women made useful, original contributions to science albeit they were not recognised as such. Women were seen as good at this type of meticulous, routine work.

When the first electronic computers were developed in the later years of the Second World War it was unsurprising that women were heavily involved in their operation partly because of their previous roles, and partly because men had been sent to fight. There appears to have been an attitude that the design and construction of such machines was men’s work and their actual use, the physical act of programming was women’s work – often neglected by those men that built the machines.

It was in this environment that the now renowned Grace Hopper worked. She started writing what we would now describe as compilers to make the task of programming computers easier. She was also instrumental in creating the COBOL programming language, reviled by computer scientist in subsequent years but comprising 80% of the world’s code by the end of the 20th century. The process that Hopper used to create the language, a committee involving multiple companies working towards a common useful goal, looks surprisingly modern.

In the sixties there was a sea-change for women in computing, it was perceived that there was a shortage of programmers and the solution was to change programming into an engineering science which had the effect of gradually pushing women out of computing through the seventies. It was at this time that the power of computer networks started to be realised.

The next part of the book covers networking via a brief diversion into mapping the Mammoth Cave system in Kentucky which became the basis of the first network computer game: Colossal Cave Adventure. I was particularly impressed by Project One, a San Francisco commune which housed a mainframe computer (a Scientific Data Systems 940) which had been blagged from a company by Pam Hardt-English. In the early seventies it became the first bulletin board system (BBS) – a type of system which was to persist all the way through to the creation of the World Wide Web (and beyond). Broad Band also covers some of the later bulletin board systems founded by women which evolved into women’s places on the Web, BBS were majority male spaces for a long time. In the meantime Resource One also became the core of the San Francisco Social Services Referral Directory which persisted through until 2009, this was a radical innovation at the time – computers used for a social purpose outside of scientific or military applications.

The internet as we know it started with ARPANET in 1969. Broad Band covers two women involved in the early internet – Elizabeth (Jake) Feinler who was responsible for the Resource Handbook – a manually compiled directory of computers, and their handlers, on ARPANET. This evolved, under her guidance, to become the WHOIS service and host.domain naming convention for internet addresses. The second woman was Radia Perlman, who invented the Spanning Tree Protocol for ethernet whilst at DEC in 1984.

This brings us, in time, to the beginning of the World Wide Web. The World Wide Web grew out of the internet. Hypertext systems had been mooted since the end of the Second World War but it wasn’t until the eighties that they became technically feasible on widely available hardware. Broad Band cites British Wendy Hall and Cathy Marshall at Rank Xerox as contributors to the development of hypertext systems. These were to be largely swept away by Tim Berners-Lee’s HTML format which had the key feature of hyperlinking across different computers even if this made the handling of those links prone to decay – something handled better by other non-networked hypertext systems. The World Wide Web grew ridiculously quickly in the early nineties. Berners-Lee demonstrated a rather uninspiring version at HyperText ’91 and by HyperText ’94 he was keynote speaker.

There is a a brief chapter devoted to women in gaming. Apparently Barbie Fashion Designer sold 600,000 units in 1996 more than Doom and Quake! There was a brief period when games were made very explicitly for girls – led to a degree by Brenda Laurel who had done extensive research showing boys strive for mastery in games, whilst girls were looking for a collaborator to complete a task. These ideas held sway for a while before a more diverse gaming market took hold which didn’t divide games so much by gender.

It is tempting for me to say that where women have made their mark in computing and the internet is in forming communities, communicating the benefits of technology and making them easier to use – in a reprise of the early pioneering women in science – because that is what women are good at. However, this is the space in which women have been allowed by men – it is not a question of innate ability alone.

I found this book really interesting, it is more an entry point into the topic of women in computing than a comprehensive history. It has made me nostalgic for my computing experiences of the eighties and nineties, and I have added a biography of Grace Hopper to my reading list.

Book review: The Wood Age by Roland Ennos

My first book of 2023 is The Wood Age: How wood shaped the whole of human history by Roland Ennos, a history of wood and human society.

The book is divided into four parts “pre-human” history, up to the industrial era, the industrial era and “now and the future”.

Part one covers our ancestors’ life in the trees and descent from them. Ennos argues that nest building as practised by, for example, orangutans is a sophisticated and little recognised form of tool use and involves an understanding of the particular mechanical properties of wood. Descending from the trees, Ennos sees digging sticks and fire as important. Digging sticks are effective for rummaging roots out of the earth, which is handy if you moving away from the leaves and fruits of the canopy. Wood becomes harder with drying (hence making better digging sticks), and the benefits of cooking food with (wood-based) fire are well-reported. The start of controlled use of fire is unknown but could be as long ago as 2,000,000 years. The final step – hair loss in humans – Ennos attributes to the ability to build wooden shelters, this seems rather farfetched to me. I suspect this part of the book is most open to criticism since it covers a period well before writing, and with very little fossilised evidence of the key component.

The pre-human era featured some use of tools made from wood, and this continued into the “stone” age but on the whole wood is poorly preserved over even thousands of years. The oldest wooden tools discovered dates to 450,000 years ago – a spear found in Essex. The peak of tool making in the Neolithic is the bow and arrow – as measured by the number of steps required, and materials, required.

The next part of the book covers the period from the Neolithic through to the start of the Industrial Revolution. In this period ideas about farming spread to arboriculture, with the introduction of coppicing which produces high yields of fire wood, and wood for wicker which is a new way of crafting with wood. There is some detailed discussion on how wood burns, and how the introduction of charcoal, which burns hotter is essential to the success of the “metal” ages and progressing from earthenware pottery (porous and weak) to stoneware, which is basically glassy and requires a firing temperature of over 1000 celsius. As an aside, I found it jarring that Ennos quoted all temperatures in Fahrenheit!

This section has the air of describing a technology tree in a computer game. The ability to make metal tools, initially copper then bronze then iron then steel, opens up progressively better tools and more ways of working with wood, like sawing planks which can be used to make better boats than those constructed by hollowing out logs or splitting tree trunks. Interestingly the boats made by Romans were not surpassed in size until the 17th century.

Wheels turn out to be more complicated than I first thought, slicing a tree trunk into disks doesn’t work because the disks split in use (and in any case cutting cleanly across the grain of wood is hard without a steel-bladed saw). The first wheels, three planks cut into a circle and held together with battens, are not great. The peak of wheel building is the spoked wheel which requires steam bent circumference, turned spokes and a turned central hub with moderately sophisticated joints. Ennos argues that the reason South America never really took to wheels, and the Polynesians did not build plank built boats was a lack of metals appropriate for making tools.

Harder, steel tools also enabled the carpentry of seasoned timber – better for making furniture than greenwood which splits and deforms as it dries.

Ultimately the use of wood was not limited by the production of wood but rather by transport and skilled labour. The Industrial Revolution picks up when coal becomes the fuel of choice – making manufacturing easier, and allowing cities to grow larger.

The final substantive part of the book covers the Industrial Revolution up to the present. This is largely the story of the replacement of wood as fuel with coal, wood as charcoal (used in smelting) with coke (which is to coal what charcoal is to wood), and the replacement of many small wood items with metal, ceramic, glass and more recently plastic. It is not a uniform story though, England moved to coal as a fuel early in the 19th century – driven by an abundance of coal, a relative shortage of wood, and the growth of large cities. Other countries in Europe and the US moved more slowly. The US built its railways with wooden infrastructure (bridges and sleepers), rather than the stone used in Britain, for a much lower cost. The US still tends to build domestic buildings in wood. The introduction of machine made nails and screws in the late 18th century makes construction in wood a lower skilled activity. Paper based on wood was invented around 1870, making newspapers and books much cheaper.

In the 21st century wood and processed-wood like plywood or chipboard are still used for many applications.

The final part of the book is a short look into the future, mainly from the point of view of re-forestation. I found this a bit odd because it starts complaining about the “deforestation myth” but then goes on to outline when humans caused significant deforestation and soil erosion damage.!

Ennos sees wood as an under-reported factor in the evolution of humanity, but authors often feel their topic is under-reported. I suppose this is inevitable since these are people so passionate about their topic that they have devoted their energy to writing a whole book about it.

This is a nice read, not too taxing but interesting.

Book review: Data Pipelines with Apache Airflow by Bas P Harenslak and Julian R De Ruiter

data-pipelinesMy next review is on Data Pipelines with Apache Airflow by Bas P Harenslak and Julian R De Ruiter. The book was published in 2021, and is compatible with Airflow 2.0 which was released at the end of 2020.

Airflow is all about orchestrating the movement of data from sources such as APIs and so forth into other places, it originated in Airbnb. It is designed for batch processing, rather than streaming data, and for pipelines that do not change much.

Data pipelines in Airflow are represented as "directed acyclic graphs" or DAGs which are defined in Python code using "Operators" which carry out tasks. A graph is a collection of nodes (tasks in this case) with "edges" between them. The "directed acyclic" bit means tasks have a definite order, the edges between them are "directed", and the graph cannot have loops or cycles because that would imply having to finish a set of tasks before you could start them. Simple data pipelines would just be a linear set of tasks that always follow one from another, a more complicated pipeline might bring in data from several sources before combining them to produce a final data product.

The Operators are strung together using expressions of the form "operator 1 >> operator 2" or even "[operator 1, operator 2] >> operator 3". 

Operators do not have to use Python, they can invoke code in other languages such as the BashOperator, or interact with other systems such as databases or storage systems such as S3. It is relatively easy to write your own operators. Alongside operators that do stuff there are branch operators which select one or other path in the DAG, and there are also sensors which detect changes in filesystems and trigger work and hooks which form connections with external services. Dummy operators can be used to simplify the appearance of DAGs.

As an orchestration system the intention of operators is that they should not contain a great deal of code to process data, that function should be off-loaded to libraries or systems elsehwhere.

The Airflow system is comprised of a web server which allows you to observe / trigger execution of DAGs, a scheduler which is responsible for the scheduled running of DAGs and workers which do the actual work of the DAG. The Airflow system loops over the tasks defined in a DAG, and tries to execute tasks which depends on the tasks upstream of the task in question, if they have been successfully completed then a task can execute.

A basic implementation runs DAGs locally using a simple queue to schedule work, and a sqlite database to store metadata. A production implementation would use something like Postgres or Amazon RDS as the metadata store, schedule work using Celery and run tasks in Docker containers marshalled using Kubernetes.

For some reason reading this I was reminded that big projects like Airflow are just other people’s code, and if you look too carefully you’ll find something nasty. This is both comforting and mildly scary. I think the issue was that Airflow uses jinja templating to inject parameters into code which feels wrong but is probably a pragmatic and safe why to do it, these shenanigans are not required for Python operators. Also discussed are issues with code dependencies, which the authors suggest are best eliminated by putting operators into Docker containers each of which contain their own code dependencies – allowing otherwise dependency incompatible libraries to work together. 

Alongside the material on Airflow there are moderate chunks on Python modules, testing, Docker and Kubernetes and logging so you get a well rounded view not only of Airflow but also of the ecosystem it sits in. The book finishes with deployment into various Cloud environments. I found these parts quite useful since the most complicated work I do in my role is trying to get things to work in AWS! The data science part is easy…

The book finishes with some short chapters on Cloud deployments, mentioning first fully managed services such as astronomer.io, Amazon MWAA and Google Cloud Composer before going on to talking about implementation of one of the demos in the book on AWS, Azure and Google cloud services. I considered skipping these chapters but they turned out be quite interesting in highlighting the differences between services and perhaps the preferences of the authors of both the book and of Airflow.

I found this a readable introduction to Airflow with some nice examples, and interesting additional material. Useful if you are thinking about using Airflow, or even if you are working on data pipelines without Airflow since it provides a good description of the moving parts required.

The code repository for the book is here: https://github.com/BasPH/data-pipelines-with-apache-airflow

Book review: Hedy’s Folly by Richard Rhodes

Back to some history of technology with hedys_follyHedy’s Folly by Richard Rhodes. This book concerns the patent granted to Hedy Lamarr, Hollywood actress, and George Antheil, experimental musician, for the frequency hopping radio communications system. Originally it was intended to allow secure, jamming resistant communications between torpedoes and their control aircraft or ships, nowadays it is most notably the basis for Bluetooth and WiFi communications.

I’ve previously read Richard Rhodes “The Making of the Atomic Bomb”, which is a massive tome, Hedy’s Folly is a rather more modest affair. It provides some biographical material on Lamarr (born Hedwig Kiesler) and Antheil but only in as much as it leads to the patent of the title.

Hedy Lamarr was born in Austria in 1914, to a wealthy family – her father was banker who clearly cultivated her interest in how things worked. Following a brief career in European theatre and film she married Fritz Mandl in 1933. He was an arms manufacturer and one of the richest men in Austria. He didn’t want to see his wife continue her acting career. On the death of her father Lamarr resolved to leave her husband but in the interim she paid close attention to the technical discussions on armaments which she was party to. In all likelihood she was doing this throughout her marriage, despite his controlling nature Mandl clearly valued her opinions (even if he didn’t like them). Lamarr then moved to the States with Louis Mayer of MGM for whom was to make a number of films. In this milieu she met George Antheil.

Antheil in Trenton, New Jersey in 1900. He travelled to Europe in 1921 where he composed the Ballet Mécanique, originally intended as the score to a film it ended up twice the length of the film. As originally envisaged Ballet Mécanique required 16 player pianos and an aeroplane propeller – amongst many other sound making devices. Essentially Antheil vision was much in advance of what technology in the twenties and thirties could deliver. The player piano plays a part in the story. Player pianos were briefly popular as a way for everyone (who could afford one) to make music, they were automated pianos programmed using a paper roll with holes directing the music. The operator simply had to provide power and rhythm. They were supplanted by radio. The important feature was the ability to control sound automatically.

Antheil returned to the US, to Hollywood, in 1936 where he turned to writing film scores, his experimental music proving rather unpopular. It was here he met Hedy Lamarr.

The spirit of the Second World War in the US was that everyone would do what they could to help. Antheil had a sideline in writing about endocrinology, and made suggests on how to defeat the Nazis by this approach. Later in the war Hedy Lamarr was to do considerable work in encouraging Americans to buy government bonds to support the war effort, as well as volunteering at the Hollywood Canteen – entertainment for servicemen.

Lamarr was an inventor in her spare time, her background meant she knew the problems faced with torpedo guidance. So it was not unsurprising for her to work with Antheil on a frequency hopping patent for torpedo guidance. The central idea of the frequency hopping patent was to transmit radio instructions between controller and torpedo over a series of radio channels at different frequencies switching synchronously between channels. In the original patent the number of channels used is relatively small (less than 10), hops are relatively slow – of order minutes and were controlled by a player piano style roll.

The US Navy chose not to develop the patent, stating that the apparatus was too bulky. This seems to be a bit of a misunderstanding – the player piano inspiration was indeed quite bulky but could easily reduced in size using current technology. More likely was the fact that US torpedo performance at the beginning of the war was abysmal – 60% of torpedos experienced technical failure, so it was likely they had other priorities. 

Lamarr and Antheil’s patent on frequency hopping expired in 1959, the US military implemented several frequency hopping systems from the beginning of the sixties. As technology improved it evolved to so-called spread spectrum techniques. The difference between frequency hopping and spread spectrum is really just one of scale. These techniques finally became public in 1976.

Spread spectrum techniques eventually found important applications in Bluetooth and WiFi. Originally designed to be resistant to jamming – the deliberate use of noise to block signals – it is also resistant to unintentional noise. Furthermore it can be used with very low power transmissions so it can cohabit with other signals used for longer range applications and parts of the electromagnetic spectrum where there is a lot of noise.

Hedy Lamarr’s part in the development of frequency hopping is finally being recognised, and George Antheil’s more experimental music is finally being recognised too – technology has now reached the point where his original vision can finally be realised.

This is a fascinating little book, focused on one small invention with huge consequences. It isn’t a biography of Hedy Lamarr, and it isn’t a biography of her co-inventor George Antheil.