Tag: statistics

Book review: The Seven Pillars of Statistical Wisdom by Stephen M. Stigler

sevenpillarsThe Seven Pillars of Statistical Wisdom by Stephen M. Stigler is a brief history of what the author describes as the key pillars of statistics. This is his own selection rather than some consensus of statistical opinion. That said, to my relatively untrained eye the quoted pillars are reasonable. They are as follows:

1 – Aggregation. The use of the arithmetic average or mean is not self-evidently a good thing. It was during the 17th century, when people were taking magnetic measurements in order to navigate, that ideas around the mean started to take hold. Before this time it was not obvious which value one should take when discussing a set of measurement purportedly measuring the same thing. One might take the mid-point of the range of values, or apply some subjective process based on your personal knowledge of the measurer. During the 17th century researchers came to the conclusion that the arithmetic mean was best.

2 – Information. Once you’ve discovered the mean, how good is it as a measure of the underlying phenomena as you increase the size of the aggregation? It seems obvious that the measure improves as the number of trials increases but how quickly? The non-trivial answer to this question is that it scales as the square root of N, the number of measurements. Sadly this means if you double the number of measurements you make, you only improve you confidence in the mean by a factor of a little over 1.4 (that being the square root of 2) . Mixed in here are ideas about the standard deviation, a now routine formulation quoted with the mean. It was originally introduced by De Moivre in 1738, for the binomial distribution, but then generalised by Laplace in 1810 as the Central Limit Theorem.

3 – Likelihood. This relates to estimating confidence that an observed difference is real, and not due to chance. The earliest work, by John Arbuthnot, related to observed sex ratios in births recorded in England and whether they could be observed by chance rather than through a “real” difference in the number of boys and girls born.

4 – Intercomparison. Frequently we wish to compare sets of measurements to see if one thing is significantly different from another. The Student t-test is an example of such a thing. Named for William Gosset, who took a sabbatical from his job at Guiness to work in Karl Pearson’s lab at UCL. As an employee Guiness did not want Gosset’s name to appear on a scientific paper (thus revealing their interest), so he wrote under the rather unimaginative pseudonym “Student”.

5 – Regression. The chapter starts with Charles Darwin, and his disregard for higher mathematics. He professed a faith in measurement and “The Rule of Three”. This is the algebraic identity a/b = c/d which states that if you know any 3 of a, b, c and d you can calculate the 4th. This is true in a perfect world, but in practice we would acquire multiple sets of our three selected values and use regression to obtain a “best fit” for the fourth value. Also in this chapter is Galton’s work on regression to the mean in particularly how parents with extreme heights had children who were on average closer to the mean height. This is highly relevant to the study of evolution and the inheritance of characteristics.

6 – Design. The penultimate pillar is design. In the statistical sense this means the design of an experiment in terms of the numbers of trials, and the organisation of the trials. This starts with a discussion of calculating odds for the French lottery (founded in 1757) and providing up to 4% of the French budget in 1811. It then moves on to RA Fisher’s work at the Rothamsted Research Centre on randomisation in agricultural trials. My experience of experimental design, is that statisticians always want you to do more trials than you can afford, or have time for!

7 – Residual. Plotting the residual left when you have made your best model and taken it from your data is a time honoured technique. Systematic patterns in the residuals can indicate your modern is wrong, that there are new as yet undiscovered phenomena to be discovered. I was impressed to discover in this chapter that Frank Weldon cast 12 dice some  315,672 times to try to determine if they were biased. Data collection can be an obsessive activity. This story from the early 20th century is not in common.

Seven Pillars is oddly pitched, it is rather technical for a general science audience. It is an entertainment, rather than a technical text. The individual chapters would have fitted quite neatly into The Values of Precision, which I have reviewed previously.

Book review: Risk assessment and Decision Analysis with Bayesian Networks by N. Fenton and M. Neil

riskassessmentAs a new member of the Royal Statistical Society, I felt I should learn some more statistics. Risk Assessment and Decision Analysis with Bayesian Networks by Norman Fenton and Martin Neil is certainly a mouthful but despite its dry title it is a remarkably readable book on Bayes Theorem and how it can be used in risk assessment and decision analysis via Bayesian Networks.

This is the “book of the software”, the reader gets access to the “lite” version of the author’s AgenRisk software, the website is here. The book makes heavy use of the software in terms of presenting Bayesian networks and also in the features discussed. This is no bad thing, the book is about helping people who analyse risk or build models to do their job rather than providing a deeply technical presentation for those who might be building tools or doing research in the area of Bayesian Networks. With access to AgenRisk the reader can play with the examples provided and make a rapid start on their own models.

The book is divided into three large sections. The first six chapters provide an introduction to probability, and the assessment of risk (essentially working out the probability of a particular outcome). The writing is pretty clear, I think its the best explanation of the null hypothesis and p-values that I’ve read. The notorious “Monty Hall” problem is introduced. It then goes into Bayes’ theorem in more depth.

Bayes Theorem originates in the writings of Reverend Bayes published posthumously in 1763. It concerns conditional probability, that is to say the likelihood that a hypothesis H is true given evidence E written P(H|E). The core point being that we might have the inverse of what we want: an understanding of the likelihood of evidence given a hypothesis, P(E|H). Bayes Theorem gives us a route to calculate P(H|E) given P(E|H), P(E) and P(H). The second benefit here is that we can codify our prejudices (or not) using priors. Other techniques deny the existence of such priors.

Bayesian statistics are often put in opposition to “frequentist” statistics. This division is sufficiently pervasive that starting to type frequentist, Google autocompletes with vs Bayesian! There is also an xkcd cartoon. Fenton and Neil are Bayesians and put the Bayesian viewpoint. As a casual observer of this argument I get the impression that the Bayesian view is prevailing.

Bayesian networks are structures (graphs) in which we connect together multiple “nodes” of Bayes theorem. That’s to say we have multiple hypothesis with supporting (or not) evidence which lead to a grand “outcome” or hypothesis. Such a grand outcome might be the probability that someone is guilty in a criminal trial or that your home might flood. These outcomes are conditioned on multiple pieces of evidence, or events, that need to be combined. The neat thing about Bayesian Networks is that we can plug in what data we have to make estimates of things we don’t know – regardless of whether or not they are the “grand outcome”.

The “Naive Bayesian Classifier” is a special case of a Bayesian network where the nodes are all independent leading to a simple hub and spoke network.

Bayesian networks were relatively little used until computational developments in the 1980s meant that arbitrary networks could be “solved”. I was interested to see David Speigelhalter’s name appear in this context, arguably he is one of few publically recognisable mathematicians in the UK.

The second section, covering four chapters, goes into some practical detail on how to construct Bayesian networks. This includes recurring idioms in Bayesian Networks which they name the cause consequence idiom, measurement idiom, definitional/synthesis idiom and induction idioms. The idea is that when one addresses a problem, rather than starting with a blank sheet of paper, you select the appropriate idiom as a starting point. The typical problem is that the “node probability tables” can quickly become very large for a carelessly constructed Bayesian network, Risk assessment’s idioms help reduce this complexity.

Along with idioms this section also covers how ranked and continuous scales are handled, and in particular the use of dynamic discretization schemes for continuous scales. There is also a discussion of confidence levels which highlights the difference in thinking between Bayesians and frequentists, essentially the Bayesians are seeking the best answer given the circumstances whilst the frequentists are obsessing about the reliability of the evidence.

The final section of three chapters gives some concrete examples in specific fields: operational risk, reliability and the law. Of these I found the law examples the most pertinent. Bayes analysis fits very comfortably with legal cases, in theory, a legal case is about assigning a probability to the guilt or otherwise of a defendant by evaluating the strength (or probability that they are true) of evidence. In practice one gets the impression that faulty “commonsense” can prevail in emotive cases, and experts in Bayesian analysis are only brought in at appeal.

I don’t find this surprising, you only have to look at the amount of discussion arising from the Monty Hall problem to see that even “trivial” problems in probability can be remarkably hard to reason clearly about. I struggle with this topic myself despite substantial mathematical training.

Overall a readable book on a complex topic, if you want to know about Bayesian networks and want to apply them then definitely worth getting but not an entertaining book for a casual reader.