One of the bedrock assumptions of the rise of data in contemporary life is the thesis that data doesn’t lie, but this assumption is very mistaken. Part of what makes it so mistaken is the way that data requires interpretation and these interpretations are not always very straightforward, and often the sort of data that people want is not easy to access for one reason or another. On top of that, in a great deal of data presentations there are active agendas that shape the way that data is portrayed and what data is included or excluded. None of these questions are free of bias or questions about legitimacy, and as a result there are always ways where the use of data can be brought under question that demonstrate that quantitative data, far from being a solution to the problems and disputes that we have over truth and factual reality, ends up being deeply involved with those questions of legitimacy and as a result is often unable to serve as the sort of referee that we want for our disputes.
One of the more obvious places this occurs is in music charts. As someone who spends probably too much time reading, interpreting, and commenting on music charts in the United States (and around the world as well), this is an example that will resonate with me more than it may for most people, but what I have to say about music charts is also true in general with data collection and presentation, so those whose interest is in other types of data, the sort of problems that the Billboard Hot 100 has are broadly shared by other music charts as well as other collections of data in the United States and around the world. This is not intended as an exhaustive account of such problems, but is meant to illustrate some trends and patterns that are well worth considering, especially to the extent that we want to use data to help us in decision-making or as an arbiter of various competitions.
The Billboard Hot 100 has been in existence since August 4, 1958. Before that, going back to 1940, there were other Billboard charts involving best selling retail records, and throughout the history of the Hot 100 there have been a variety of other charts including the Billboard 200 for album sales (and streaming) as well as some charts like the Artist 100 and Social 50 that are more involved with social media influence. All of these charts attempt to answer the question of what song, artist, album, etc. is the biggest in the United States, often based on a blend of sales, radio airplay, and streaming activity. Beyond these vague generalities there are a lot of more specific rules that are involved, including differences in how a paid stream is counted as opposed to a free stream, for example. Likewise, audience impressions for radio airplay are related to listener data. In addition to YouTube streaming counts, points for songs are also generated by User Generated Content, where someone has created a video using a given song as a soundtrack, which make make a dramatic difference for songs that catch on with homemade lyric videos or other related contents. In addition, there are various rules for recurrency that clear out the chart of old songs that are declining in performance to open up space for newer songs.
None of these rules is entirely straightforward, and nearly everyone of them can (and has) been subjected to a great deal of debate. Depending on what one wishes to use the data for, a variety of different answers to these questions can be found based on perspective. For example, if one wants to know what songs are the most popular with a given audience of people, is it right to include songs that are pushed on one’s stream by YouTube or Pandora channels or that are played ad nauseum on radio stations thanks to label payola? If one is a label looking to see how one’s songs are performing relative to new songs, how long of a period does one want for recurrency and what sort of rules does one want for songs that periodically or seasonally gain in popularity like Michael Jackson’s “Thriller” or Mariah Carey’s current #1 hit from 1994, “All I Want For Christmas Is You?” If one is looking for music charts that are as pure as possible, this would mean accepting the fact that some songs hang on in popularity forever and would still be on the charts many weeks after debuts in the same way that some albums are perennially popular enough to be among the best selling albums for seemingly forever. And that is not even getting into the question of whether streams should count towards sales–I personally believe that they should be counted as personal radio spins and ineligible for sales, but this would mean very few albums would sell because the pure sales and fractions of digital sales would indicate a climate where buying music was very unpopular compared to the past, which in my mind is an accurate impression.
So why do I call this a matter of data lying? I say that because every list of what the Hot 100 songs are on the charts has to come with an asterisk, or several asterisks. Included among them is the question of how much airplay for a given chart on radio or streaming is “organic,” or driven by what a customer has sought out, and how much is based on what a label is pushing through buying time on a curated streaming channel or radio station playlist. Included among them is how much time do we want older songs that are no longer being actively promoted to fill up the charts and thus deny spots on the charts to new songs that artists and labels want to appear on the charts but which do not yet have the popularity to rise above the residual sales and airplay and streaming of songs that have been out for a while. Sometimes this can be a very long while. For example, there are songs on the Disney charts, for example, that have remained for three or four years in the top 20. Do we want to acknowledge this staleness in what we really want to listen to or do we want to push out the old and get in with the hip and new? I say this as a person who regularly on-demand streams songs from the 1980’s and who has a Haydn channel on Pandora that I listen to sometimes. Clearly I have no personal animus towards music that has stood the test of time, but there are some people who find the enduring popularity of material from the past an active threat to the establishment of hype and achievements for artists of the now who have not yet proven themselves.
Different people want to use data for different reasons, and different people have varying agendas that they want to push. In some countries, like the UK, the charts are even more dubious in terms of legitimacy than the United States, such as the limitation for only three songs by an artist at a given time, and some charts, like radio charts, have recurrency rules that are extremely restrictive so as to give a lot more songs those coveted top ten spots that they might otherwise never get because they are blocked by songs that simply refuse to go away. To the extent that we want data to reflect reality, it may reflect a far more conservative reality than many people want to recognize, and the fact that data is sought by companies to shape their own strategies of promoting artists and content means that data will often reflect the desires of those who are willing to pay a lot of money to promote the intellectual property under their control as opposed to those of us who want to pay nothing at all to get data that we can use to better understand popular culture without any recurrency rules that distort the picture of how music is actually consumed by the listening public. The choices made for what data to count, how much to count it for and in what categories to count it as, and what to include and what to exclude are not unbiased questions, but they are rather shaped by our agendas and worldviews and purposes. As such, they are as dishonest as every other form of communication that we have and use. The charts cannot settle arguments because like everything else they are part of the argument itself.