[Note: This blog is part of a series on SQL learning. ]
There are at least two obvious takeaways about this month’s SQL lunch that have nothing to do with SQL. First, I’m glad I don’t go for the food. The food from The Meeting Place was disappointing—not only did they forget two of the orders for our group while charging for food that they did not provide for those people, but for some mysterious reason they thought that when I ordered a sandwich with only meat, lettuce, and cheese, that mayo and mustard would be acceptable as well. Apparently they don’t realize that some of us are plain enough not to want any sauces on a sandwich. Additionally, I’m glad that I didn’t sign out for lunch, given that I’m the only hourly person of the bunch, because the meeting went two and a half hours, when it was only scheduled to go for an hour, and that would have cost me money I can ill afford to lose. Why am I not on salary, again?
Concerning the meeting itself, today our focus was on applying the relational model that we discussed last time. After all, you can’t query a theory, and the real problem is not theoretical in nature, but rather, does one have the good, timely data one needs to make smart business decisions. How does theory help in this very practical aim? Relational data provides an organized way to store and retrieve data. The ways it does so are through data structures, keys and constraints, entity relationships, and normalization. The world of SQL is like an iceberg, going all the way down to the level of machine code and storage engines, which are matters that database administrators care a great deal about. Reporting analysts, though, only care about the database engine itself, which is what we interact with when trying to pull data from the database. We spent a bit of time discussing logical database structures, where every datum (piece of data) is accessible given the table name, primary key, and column name, where if one has any of those two the third can be discovered through triangulation. We also looked at the hierarchical order between server, instance, database(s), schema(s), and tables, and the advantages and disadvantages of adding a schema layer of organization to improve security. After this we discussed relations, looking at customers, demographic information, and exploring the nomenclature of attributes (columns), tuples/records (rows), and the way that even where there is no obvious key in a given dataset that a composite dataset can be made by concatenating data together.
After this we moved into three vitally important ways that good database design helps with reporting: accuracy, efficiency, and logical relationships. In data, as in life, there are many illogical relationships, after all. In order to ensure good accuracy, it is important that there be logical structures (tables, columns, and keys), providing an atomic value, and that there be primary key restraints that allow for the creation of a clustered index that provides a logical order to the data and makes it easier to retrieve information from a particular table. Related to this, we focused on the importance of establishing constraints to keep bad data from being input into the database in the first place, and also the fact that ACID transactions are needed: atomic, consistent, isolated, durable, which allows convenient use and consistent commits (input) and rollback (if there is an error) without hindering the use of the database by others. We also focused on the importance of efficiency by a skillful knowledge of data types, the optimization of indices, and the three business levels of normalization that reduce data redundancy. After this we discussed logical relationships, including primary keys, foreign keys (which are primary keys of related tables), and entity relationships: one-to-one, one-to-many, and many-to-many. We disparaged many-to-many and the difficulties of those, and the need to decompose such relationships into one-to-many for ease of use. We also discussed the difference between merely having a key, and constraining that key so that we made sure it was unique. Life, like good data, is lived under a multitude of constraints.
After this we moved from theory to a demo of our CRM data, to actually start to look at how to apply the relational model and the complexity that can come from looking at the way data is organized. We noted that there were two servers, one production server where it is efficient and quick to input information, and a mirrored reporting server, used by reporting types like myself, which are structured in a way to be easy for querying. We also discussed heaps, which are mere lists as well as tables with no primary key constraints, which makes it easy to ruin the integrity of such data, making them the digital equivalents of what is left of a city after the invasion of the Assyrians or some other such brutally efficient army. We spent a fair amount of time talking about the three levels of normalization, commenting that as reporting analysts we generally prefer the second level, which includes some redundancy of column names rather than showing only the keys, which is common in production databases. We spent a lot of time talking about the hidden layers of databases, and how one uses clues to discover what hidden levels of data exist when one gets different results than one is expecting. Even data is not always transparent, and sometimes one has to investigate the very structure of data to see what layers are beneath the surface, because what is shown is very seldom all that is, even when one is looking at columns and rows of data. Before the meeting was adjourned, we were given the task, before next month, of thinking of questions that we have for the database that we want answers to, in the expectation that we will be learning how to write the queries as well. The tougher the question, the harder the work it requires to get the right answer, in data as in life.
 See, for example: