Spurious Connections: Correlation Does Not Equal Causation, by Tyler Vigen
This is an entertaining book, but it is a book that carries with it a big point for those who are interested in aggregating data trends and taking advantage of the mass of data that is available in our contemporary age . The subtitle of the book gives a truism that we would do well to remember, that correlation does not equal causation, and the book goes out to prove it. The book is meant to be taken with a certain bit of humor about it, as the author demonstrates when he writes about how he should have been studying for his law school finals when he was writing this book, but many of us can certainly relate to writing when we should be doing something else. At times the author even gets personal, such as when he pokes at law schools for the way that their professors outsource outlines and the writing of textbooks based on the decisions of judges but still collect the royalties anyway, something that I find funny as a person who has had to buy a great many textbooks for large amounts of money that were not particularly original materials.
The roughly two hundred pages of this book, containing about half as many spurious correlations, are divided thematically into ingestibles (ones relating to food), science-related ones, cultural curiosities, movers, shakers, and moneymakers, and famous folks. The spurious correlations chosen demonstrate the power of data aggregation and the dangers of using only a few data points, which can lead to very high correlations for entirely unrelated phenomena. The various correlations chosen reveal some of the interests of the author, from UFO sightings to marriage and divorce rates to films and popular culture. Each of the spurious correlations chosen is given a rather rudimentary graph that includes two different scales for the axes for the two phenomena that correlate and the years over which the data is taken. The author makes some witty comment about the correlation and includes notes about where the data come from, some of which have highly entertaining comments of their own on the pages underneath the graphs. The graphs, rather tellingly, do not tend to be scaled to zero, which makes the apparent correlations greater than they would be normally, and this subtle distortion has a point.
Overall, this is a short book that manages the impressive feat of being both deeply entertaining as well as deeply informative. These are not qualities that are easy to combine, not least for someone who appears to be a bit of a novice when it comes to writing books, but the author manages this task with aplomb. By showing a book full of dodgy data dredging in order to find some particularly wacky correlations, the author aims to disabuse readers of the intuition that phenomena that correlate to each other are necessarily connected to each other in some sort of causal fashion, especially because the ease of making connections with data is likely only to increase the amount of spurious connections over few data points that are made. Since this world is full of people who abuse data and make apparent connections using the same techniques that this author uses humorously in this book, the book therefore serves as a sort of inoculation against bad data techniques, something this author likely does with a high degree of intentionality. Even as a reader who recognizes the author’s agenda, there is a great deal of humor in the book and the author is performing a valuable service to an audience that may not be all that aware of the issues of big data in our present evil age.
 See, for example: