Abbott And Costello Must Have Done A Skit About This

Occasionally in life I find that a problem exists become something appears to be different to myself and someone else. Rarely has that problem been more literal than it was today for myself and someone at another company who I work with on a regular basis in the course of my duties. One of the data questions this other person had related to the amount of time that it took for the leads that came from his company to convert into policies. He was having some trouble with the mathematics, so he sent his report to me, I tweaked it a little bit and sent it back to him, and I figured it would be fairly straightforward for the problem to be solved. It wasn’t. First, there was a question about which columns I was using for the difference in dates. Then things got more surreal. He kept on questioning the data in one particular cell, saying that there was no way that there were 253 days between November 24, 2014 and April 8, 2015. I agreed, but on the spreadsheet I was looking at, it showed August 4, 2015 as the date, and the number of days in between was correct, and so we went back and forth and back and forth until he showed what he saw in his spreadsheet and I showed him what I saw in mine. I have no idea how they got to be different, but they were different, and so we had been debating each other back and forth while looking at different data that was supposed to be the same but somehow ended up different in the transfer. Towards the end of the discussion, I made an offhand comment that I’m sure Abbott and Costello must have done a skit about this.

Of course, Abbott and Costello are most familiar for their skit about “who’s on first,” a baseball skit that regularly appears in church variety shows. I have seen it done at least twice by different pairs of siblings in my acquaintance over the course of my life. Yet while Abbott and Costello probably did not do a skit about seeing different data, their skit about baseball represents a matter of perspective. Sometimes people literally see the data differently, and judge accordingly based on that data. As much as we would like to believe that two people would be looking at the same data and therefore have the same version of the truth, this is not always the case. As someone whose job relates to data on a continual basis, I am often amused and intrigued at the distinctions that result from people looking at different numbers, even if they are supposed to be the same and come from the same place. These problems are highly relevant to our lives and our experiences outside of the admittedly somewhat arcane world of data, and so I would like to share at least a few insights that I have gathered as a result of dealing with debates over data on a regular basis. This is not going to be an exhaustive list, nor will it be a lengthy examination, as could be done, but it should at least suggest the possibilities for insight that one can gain by staring enough at and talking enough about numbers in their raw form.

Among the most elementary lessons is that which can be understood from such discussions is that sometimes people are actually looking at different data. How data becomes corrupt and degraded from one person to another, even when one person is the source of data for another, is rather curious. How was it that my conversation partner and I were looking at different data? I don’t know. Once we saw that the data was different, and that neither of us was seeing things wrong, the conversation took a different turn, as we were left to ask other, deeper questions, about how we were acquiring the information and how it may have gotten changed over time. In life, we see different things than other people do, even when we by chance are looking at the same phenomena. It is little wonder, therefore, that since our observations are different, and even our underlying raw data by which we make decisions is different, that we would come to different conclusions. This is not to say that there is not one underlying truth, but recognizing that truth and understanding it is a difficult matter even when it exists and its existence is believed and acted on by different people. Not only must the data that everyone is working with be the same, but there must be some sort of relationship between the data that is being used and an underlying reality.

This itself is problematic. Any system of data that depends purely on user input is going to cause problems. One may have a clean record, without degradation, from a given source, but yet find that source entirely untrustworthy because the incentives to cheat are so high and the ability to monitor such behavior is so low to nonexistent that one cannot believe in the basic integrity of the data. Here the question is not so much that the data may be corrupted or degraded in some fashion, because of errant key strokes or because Excel decided to act in a different way in Montreal, Quebec as it did in a suburb of Portland, Oregon, but rather because the integrity of the people inputting the data cannot be trusted. I find in the course of my work that there is a lot of untrustworthy data, often changed to make certain aspects of work easier. For example, putting information in a certain format in order to make it easier for a policy to sell and be successfully submitted and to avoid confusing a customer about subsidies on insurance premiums may simultaneously make it more difficult to gain an accurate picture of the actual policy premium submitted for a given campaign or carrier or insurance type, or the amount of revenue that is to be collected from said policy. All too often data is manipulated for specific purposes in a given process that leads to inaccuracies in reporting on that data. This happens a lot in life as well, as we may act a certain way in order to gain a certain short-term tactical goal that makes longer term strategy and logistics more difficult to accurately accomplish. All too often what we do and what we really are about and what we really want may be at cross-purposes.

Additionally, there may be wide variances in what people can see based on where they stand. Once, not too long ago, I was sent a rather airily worded discussion of best practices for a given type of report, only to find out that I could not view data given the standard that the person sending the e-mail proposed. The option for dates available for that person on their custom reports was simply not available for me, which was particularly unfortunate in this case given the amount of reporting I do and the ubiquity of the information I generate through my own queries and data pulls. The standards and criteria we look at make a large different about what we see. Whether intentionally or not, such criteria often serve as a filter, changing what we see based on what particular records contain the particular quality we are looking for. Here we are back to the beginning, almost, in that two people looking at the same large collection of data can see wildly different and inconsistent results because they have different criteria by which they are filtering and structuring what they see. It is fascinating, and sometimes frustrating, to see just how differently people can see the same things because of the filters in their mind and in their data that they are not aware of at all. If there’s not an Abbott and Costello skit about this sort of thing, there should be.