Welcome to the Database Programmer. This is a regular blog for anybody who wants practical information about how databases work and how your database and application code work together.
There is a new entry every Monday morning, and the complete table of contents is here. We are currently looking at Table Design Patterns and how they lead to tight and efficient code.
The Example: A School Class Schedule
In the past two weeks we have looked at the table of courses that teachers are teaching and students are enrolled in for a year. The five rules are:
- A teacher may not teach a class he/she is not qualified to teach.
- No two classes can be given in the same classroom in the same period.
- No teacher can be in two places at once, a teacher can only teach one class in one period.
- No student can be in two places at once, so no student can be in more than one class in the same period.
- A student cannot take the same class again after passing the class once.
Two weeks ago we looked at rule 1 and recognized a Cross Reference Validation Pattern, and last week we looked at rules 2 and 3 and saw a Limited Transactions pattern. This week we will look at rule 5 and see the Primary Key In Disguise, and next week we will wrap up this example by looking at how to do rule 4.
First Attempt: Restating Rule 5 In Terms of Tables
There are plenty of times when a customer will give you a requirement that sounds something like, "If X happens, Y may not happen." Rule 5 sounds like this kind of rule:
"If X Happens..." | "...then Y may not happen"
|
If a Student passes | the student may not
a course | take the course again
The first thing we want to do is translate this into rules about entries in tables. Our first attempt might sound something like "An entry in the completed courses table prohibits an entry in this year's schedule." It might look like this:
HISTORY TABLE THIS YEAR'S CLASSES
Student | Course Student | Course
--------+--------- ----------+----------
Nirgal | History <----------- Nirgal | Calculus
Jackie | History A "reverse" foreign Nirgal | Geometry
Nirgal | Physics key says entries in
Jackie | Physics history prevent entries
in current enrollment
The problem with this statement is that there is no way to enforce such an idea. It is not a foreign key, it is more like a reverse foreign key. A foreign key says a a parent row must exist but we are saying a parent row must not exist. If there were such a thing as a reverse foreign key we would be ready to move on to the next task. But since there is no such thing as a reverse foreign key, we must keep looking.
The Customer Does Not Design Tables
So far we have a formulation "An entry in the completed courses table prohibits an entry in this year's schedule." But here I have deliberately made a common mistake to demonstrate how programmers sometimes deal with the requirements given to us by end users. I have taken the rule and turned it into a statement about tables without first doing any real thinking. This is like translating a sentence word-for-word from one language to another. When you transliterate the user's requirements this way instead of translating, the result is exactly the same as for human languages: you get nonsense.
So we must now remember that the customer is not in the business of table design. The customer will explain their needs as best they can, but we should not expect the customers' statements to translate directly into table definitions.
So if we look at this rule again we realize that when one thing excludes another thing we may be looking at a primary key or unique constraint. If the student's completed courses and the student's current courses are stored in the same table then the problem is solved. We make a primary key or unique constraint on student + course and that is the end of that.
Relational Does Not Meet My Needs!
Nowadays (February 2008) it is pretty common to hear programmers who are relatively new to the database world say: "Relational just cannot do what I need, my customers have complex needs that don't fit into relational concepts." If that programmer is handling text, like books, or media files then they may be right. But if not, that programmer probably:
- ...has been taking the customer's requests at face value instead of translating them into solid concepts.
- ...is not aware of dead-end patterns like the "Reverse Foreign Key" and how to recognize the valid patterns that are hiding behind them.
Now that we know we probably have a primary key, and not some kind of weird reverse foreign key, it is time to design the table.
A Primary Key In Disguise
Putting them into the same table is not that big a deal. It is nothing more than a list of courses the student has taken or is currently taking. The table would look something like this:
The students_x_courses Table
Primary Key (or perhaps an int primary
| key and this is a
| unique constraint)
|
+---------+
| |
Student | Course | Year | Flag_hist | Grade
--------+----------+-------+-----------+--------
23 | HIST-302 | 2005 | N |
23 | PHYS-101 | 2003 | Y | 92
23 | CHEM-211 | 2004 | Y | 96
Deletions Required
We should note that if a student fails a course, the approach we are taking requires that the row be deleted outright, otherwise they cannot take it again. All database programmers share a deep uneasiness about deleting data, it makes us nervous. There is good reason for this. As soon as you delete the failed classes somebody will ask for some statistics on how often students fail courses, which you cannot tell them because you deleted the data! Nobody wants to be having that conversation!
Therefore we should assume that we will have a separate table just to store a record of course failures. We will copy failure records to that table before we delete them.
Putting Current Data and Historical Data Together
There are plenty of strange ideas out there, and sometimes an idea from ages past will persist long after anybody remembers where the idea even came from. One such idea is that you should not mix together historical and current data.
This idea came from long ago when hard drives where extremely expensive, when many data files were stored on tapes or cartridges and never got near a hard drive. The idea was to separate your live and historical data because the bulk of your operations required one or the other but rarely both, and you could not afford to keep them both available at the same time. In really old fashioned batch operations something we take for granted like a simple query would be done by loading tape after tape onto a refrigerator-sized tape machine that executed some query program. If you kept history and live data together you might have to load 30 tapes, but if you kept them separate you would only have to load 3 tapes to pass the live data. That basic practice has stayed with us ever since in the form of prejudices and vague advice about "not mixing current and history."
But nowadays we do not need to worry too much about hard drives, there is no validity to most of the reasons for keeping this data separate.
Conclusion: Translation, Not Transliteration
This week we saw what happens when we take user requirements at face value. They lead us into dead-end design patterns, that is, patterns that are not valid and cannot be implemented. We saw how important it is to take the users' statements and seek out the "disguised" primary keys, foreign keys and so forth. This is the sad result of transliteration, the direct conversion of user statements into table designs.
The correct process is translation, looking at the user statements with a critical eye and seeking out the primary keys, foreign keys and table definitions that are lurking there. Those programmers who do not learn this skill will be led into the false belief that they are somehow dealing with problems that no human being has ever seen before, and that they need some post-relational or extra-relational or razmataz-relational system that will provide a unique, strange and clever solution to their troubles. The truth is usually much less interesting and not nearly so ego-inflating. The truth is that most data does fit into tables pretty neatly, and that most rules (but not all) can be expressed as unique constraints and foreign keys.
Next week we will wrap up this example by looking at rule 4, that no student can be in two places at once.