Statistics: Concepts And Controversies, by David S. Moore
This book, almost alone among the books I have read on data and statistics, presents itself as a textbook (designed for liberal arts students in our nation’s colleges and universities) with useful exercises designed to help foster data literacy without demanding a great deal of mathematical expertise among its intended audience. There is much to admire about this book’s aims–data illiteracy, and the manipulations of data and the representation of data, is a serious barrier for people to understand the complicated nature of reality, and leave people prey to all kinds of misinterpretation about the universe in which we live. A careful and critical reader of this book will find much to appreciate in its material, even if it probably requires updating (the volume I read has most of its data coming from the late 1990’s, so it is almost certainly far out of date). As this book does not demand mathematical expertise from its readers, it is not intended to help educate the quants on which our contemporary advances in big data depend. Nevertheless, this book is designed for those whose interests are in the social sciences, with a goal to making them wiser and more critical readers of statistical information in news reports and marketing techniques, and also to make them friendlier and more understanding of the difficulties faced by those of us in the data sciences, and these are noble and worthy aims, worthy of books like this one.
In terms of its structure, the book is divided unequally into four parts. Each part contains more than one chapter (although the first two parts contain more than the last two parts), and each chapter as well as each part contains its own exercises designed in fostering sound statistical reasoning. The first part looks at where data comes from in terms of studies, naturalistic observations, and experiments, and discusses best practices for sampling and experiments, as well as ethics and checking to make sure the data presented by a given party makes sense. The second part looks at how to organize data, providing advice on charts, histograms, leaf and stem diagrams, scatterplots, and the relationship (or lack thereof) between correlation and causation that in many ways is a less technical and less beautiful but otherwise similar advice to the far more elegant and demanding writings of Tufte . Part three deals with probability and the larger concern about chance, and the fourth and final part of the book deals with inference and significance and discusses chi-square tests and other related mathematical methods, for those students who are more interested in research methods, probably on the graduate level.
Despite the high quality of the work, and the tone of heavy irony and skepticism about many of the shoddy and downright dishonest statistical practices of many sellers and marketers of data, there are some worrisome concerns about the book. For one, it tends to be somewhat biased in the people it praises for statistical rigor, choosing people like Galton and Keynes, who were not quite as free form systemic bias as would be ideal, and whose use of data was often in service of dubious moral and political aims. Although the author is willing to discuss some of the causes of systemic bias in statistics, there are other issues that are ignored, including the fact that exit polling has often tended to consistently undercount conservative voting patterns. It is also troubling how the author misrepresents belief as an irrational matter of personal opinion rather than sound and wise judgment based upon good authorities as well as observation and the acquisition of knowledge. In terms of its political worldview, the author is quick to jump on race as an element of consistent bias in terms of polling, but is generally far too favorable to liberal and even socialistic aims. It is perhaps too much to ask for the author of a textbook, whose work has to appeal to left-leaning professors of mathematics classes for non-STEM majors in order to sell well in the academic market, but a lot more work could have been done to make this work even remotely fair and balanced. A student who wishes to be data literate and who recognizes the subtle but powerful pressures of political bias on statistical representation, however, would be able to correctly draw inferences about the deeply human nature of statistical presentation and reasoning, and that is enough to make this book worthwhile, so long as its systemic bias is accounted for and corrected by the reader, which is in general a sound practice for reading.
 See, for example: