February 2017 archive

Book review: Weapons of Math Destruction by Cathy O’Neil

weapons_of_math_destructionObviously for any UK anglophone the title of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil is going to be a bit grating. The book is an account of how algorithms can ruin people’s lives. To a degree the “Big Data” in the subtitle is incidental.

Cathy O’Neil started her career as a mathematician before worked for the Shaw Hedge Fund as a quant before moving to Instant Media to work as a data scientist. It’s nice to know that I’m not the only person to have become a data scientist largely by writing “data scientist” on their CV! Nowadays she is an activist in the Occupy movement.

The book is the result of O’Neil’s revelation that algorithms were often used destructively, and are responsible for gross injustices. Algorithms in this case are models that determine how companies, and sometimes government, deal with their employees, customers and citizens; whether they are offered loans, adverts of a particular sort, employment, termination or a lengthy prison sentence.

The book starts with her experience at Shaw where she saw the subprime mortgage crisis from quite close up. In a nutshell: the subprime mortgage crisis happened because it was in the interests of most of the players in the industry for the stated risk of these mortgages to be minimised. The ratings agencies were paid by the aggregators of these mortgages to rate their risk, and the purchasers of these risk ratings had an interest in those ratings to be low – the ratings agencies duly obliged.

The book goes on to cover a number of other “Weapons of Math Destruction”, including models for recruitment, insurance, credit rating, scheduling (for work), politics and policing. So, for example, there are the predictive policing algorithms which will direct the police for particular parts of town in an effort to reduce serious crime but where the police will consequently record more anti-social behaviour which will lead the algorithm to send them there again because it turns out that serious crime is quite rare but anti-social behaviour isn’t (so there’s more data to draw on). And the police in a number of countries are following the “zero-tolerance” model which says if you address minor misdemeanours then more serious crimes are fixed automatically. The problem in the US with this approach is that the police are sent to black neighbourhoods repeatedly (rather than, say, college campuses) and the model is self-reinforcing.

O’Neil identifies several systematic problems which are typically of Weapons of Math Destruction. These are the use of proxies rather than “real outcomes”, the lack of feedback from outcomes to the model, the scale on which the model impacts people, the lack of fairness built into the model, the opacity of the models and the damage the models can do. The damage is extensive, these WMDs can lead to you being arrested, incarcerated for lengthy periods, denied a job, denied medical insurance, and offered loans at most extortionate rates to complete courses at rather low rate universities.

The book is focused almost entirely on the US, in fact the only mention of a place outside the US is of policing in the “city of Kent”. However, O’Neil does seem to rate the data and privacy legislation in Europe – where consumers should be told of the purposes to which data will be put when they supply it. Even in the States the law provides some limits on certain types of model (such as credit scoring) but these laws have not kept pace with new developments, nor are they necessarily easy to use. For example, if your credit score is wrong fixing it although legally mandated is not quick and easy.

Perhaps her most telling comment is that computers don’t understand fairness, and certainly don’t exhibit fairness if they are not asked to optimise for it. Which does lead to the question “How do you implement fairness?”. In some cases it is obvious: you shouldn’t make use of algorithms which explicitly take into account gender, race or disability. But it’s easy to inadvertently bring in these parameters by, for example, postcode being correlated with race. Or part-time working being correlated with gender or disability.

As a middle aged, middle class white man with a reasonably well-paid job, living in a nice part of town I am least likely to find myself on the wrong end of an algorithm and ironically the most likely to be writing such algorithms.

I found the book very thought-provoking, it will certainly lead me to ask me whether the algorithms and data that I am generating are fair and what the cost of any unfairness is.

Book review: I contain multitudes by Ed Yong

multitudesThis book was a Christmas gift, for which I’m very grateful! I Contain Multitudes: The Microbes within us and a Grander View of Life by Ed Yong is all about bacteria.

Bacteria are somewhat neglected in the popular science literature, I think the closest I can come is The Eighth Day of Creation by Maurice Freeland Judson which is about the discovery of DNA and its role in molecular biology in which bacteria and viruses play a part.

Yong’s book is about the relationship between bacteria and other organisms, humans included. It reveals a world where bacteria are not simply passengers on oblivious hosts but are a heavily integrated part of the host’s life cycle.

The study of the “microbiome” is relatively recent. Unravelling the members of a microbial community prior to the invention of cheap, and easy, DNA sequencing was hard. Carl Woese pioneered this approach in the 1970s, and used it discover the archea, a whole knew Kingdom of life (plants and animals are two of the other Kingdoms, to give you and idea of the magnitude of this discovery). Sequencing of the bacterial inhabitants of humans gained pace in the 2000s when it was discovered that we all carry a rich community of bacteria which varies from site to site around the body, let alone from individual to individual. What is true for humans is true for other organisms.

The book continues with an overview of how important bacteria can be to an organisms life. For example choanoflagellates, typically single-celled organisms, only form colonies in the presence of certain bacteria. And bobtail squid rely on bacterial partners to provide their luminescence. The standard lab animals (mice, zebrafish, flies) have been raised in germ-free environments and whilst they do not die, they do not flourish – even in the comfortable environment of the lab. The Wolbachia bacteria interferes with the sex lives of its insect hosts, it is only passed down via the eggs of the female and so it arranges by various means that there are more eggs and females than sperm.

These partnerships are not accidental, in the sense that organisms often provide specific structures to support their bacterial partners and exchange specific molecular markers with them. In some cases the host is essential to the survival of bacteria it contains because they have given up on carrying out tasks essential to their continued existence, for example in the supply of essential nutrients. This is true on many scales, animals from termites to cows have digestive systems designed to accommodate a particular bacterial support team to enable them to digest what would otherwise be food of low nutritional value. The early years of a human infants life are shaped by its acquisition of the right microbiome to prime the immune system and aid digestion.

The reason that bacteria are so effective in providing support services to their hosts is their high rate of evolution. Not only do they replicate fast, they have a promiscuous approach to DNA they come across in their environment. This means that if any bacterial species evolves a useful trait, such as the ability to digest seaweed then its neighbours in the gut can pick up that ability via its DNA. These genes can, eventually, end up in the genome of their hosts.

Japanese people who eat nori seaweed, which contains carbohydrates which the human body can’t digest on its own, host bacteria which can. Moreover, the genes those bacteria use to carry out this digestion were acquired from marine bacteria.

Yong is not misty-eyed about his bacterial subjects, as he points out their symbiosis with other organisms is not altogether harmonious – in the end the bacteria are in it for themselves.

The book finishes with some examples of how bacteria can be used to support human health, and speculates how this approach – currently only used in curing persistent C. difficile infections – could be extended to all manner of ailments including blood pressure and mental health problems.

I’ve been following Ed Yong on twitter for quite a while, and where he found the time to write a book as well as everything else he seems to do is a mystery to me! his style, as a science journalist, can be seen in the book, both in the presentation of the story, with brief character sketches of the scientists involved and quotes from them, and in the titles of the chapters which are entertaining but not necessarily informative. The book is thick with examples which build into larger themes, turn to the back of the book and you’ll find references to the primary literature.

Bacteria deserve our attention, this book is a great introduction to how they shape the lives of “higher” organisms.