Often I ponder the relationship between different fields of study. For example, in economics there is a great divide between macroeconomics in its look at large scale aggregates and microeconomics that looks at individual firms and the decisions of people. And never the twain shall meet. A similar phenomenon occurs in history, where there are macrohistorians who look at the large scale patterns and qualities of nations and regimes and microhistorians who look at individuals as well as small communities that are possible to look at in a granular fashion. It is unsurprising that these approaches would end up with vastly different results, with the big picture focusing on areas of difference and the small picture focusing on nuance and complexity and contingency. And so it is when we look at data analysis . I happen to have a good deal of work experience in that field, and so I would like to look today at some of the complexities that happen with regards to data analysis and why it is often more frustrating and less glorious than it may first appear.
What sort of picture do we get when we look at data analysis from the large scale as opposed to the small scale and what implication does this have for our insights into data as well as micro and macro perspectives in general? I have spent the vast majority of my time dealing with data on the small scale, which may be compared to moving around manure from one place to another. Data on the micro scale, at least as far as I have seen it, is pretty messy and ugly. One can tell that there are some major areas where the data lacks validity and one can think of immense amounts of information that are simply not there. Data from one source does not always agree with data from other sources, and so on it goes. Those who deal with data on the small scale have little if any confidence in the data itself and often in the conclusions that can be drawn from it. Those who know the unreliability of the sources of the information have a great deal of well-earned skepticism at those who draw grand and sweeping conclusions from what they know to be basically unreliable data.
Yet on the other hand, sometimes the intractable problems of the small scale of data fade into insignificance when one looks at the larger scope. For example, when one looks at the larger scale and scope of the data as a whole, it is often possible to make correlations and empirical connections between different types of data that one simply cannot fully gather from the raw data at hand. For the purposes of sweeping narratives and large-scale strategies, close enough may be good enough because one is seeking to understand patterns and develop a course of action that one knows will require modification based on circumstances that cannot be known in advance. One is not looking for a detailed picture, and indeed a detailed picture is almost sure to be wrong at some level, but rather one is looking for a lay of the land that provides a place to point ourselves towards or away from. To the extent that those who hold to macro views wish to trumpet just how much they know, those with a better understanding of the granular data may point out that much of the data is unreliable and however specific is only worthwhile as an estimate.
Yet it is from the essentially unreliable data that we have to work with that analysis must proceed from. The fact that our data is unreliable does not mean that there is no ultimate reality, but rather that we are ill-equipped to see the truth because so much of the data depends on the imperfect actions of highly unreliable and imperfect people, people much like ourselves. Likewise, the fact that our data is not perfect reliable does not mean that it is not useful. Its precision may be bounded, but sometimes that very imprecision is itself notable. To pick and example not at random, we may be able to get phone data from one source and lead data from another and be able to tell that a great many calls never become leads because of a lack of work ethic among those receiving the calls, but we could tie together those who received calls and those who input leads and be able to come up with tolerable accuracy a list of those people who did the best and worst job of inputting data in. The possibility exists, therefore, for the data itself to help point out issues with the data so that such issues may be improved. If they will always fall a bit short of perfection, they can be made better at least.
This leads to further implications that are worthwhile to consider. The glory of an analyst is to analyze. In order to find any sort of satisfaction in working with data, an analyst needs to be able to draw some conclusions and make some use of the data. If all one’s time is spent in shoveling manure from one pile to another, one feels as if there is not a great deal of purpose in one’s efforts. On the other hand, being able to step back and notice trends that would tell what proportion of what manure came from what species or even what individual animal, or get an idea of the chemical composition of that manure for use as fertilizer, that is something that one can get a great deal of insight from. So long as the skepticism of the exactitude of data that one gets from an understanding of the micro perspective can moderate the sweeping and broad generalizations that one gets from the macro, and so long as the broader perspective and bigger picture that makes the macro perspective so enjoyable can encourage those who find their existence more than a little bit melancholy for being so close to crappy data, both perspectives can be enriched by the insights and strengths of the others. If micro and macro may never meet, at least they may wave to each other in a friendly way as they pass along their respective journeys. Perhaps that is close enough.
 See, for example: