Showing posts with label unique constraints. Show all posts
Showing posts with label unique constraints. Show all posts

Sunday, February 3, 2008

False Patterns Such as The Reverse Foreign Key

Welcome to the Database Programmer. This is a regular blog for anybody who wants practical information about how databases work and how your database and application code work together.

There is a new entry every Monday morning, and the complete table of contents is here. We are currently looking at Table Design Patterns and how they lead to tight and efficient code.

The Example: A School Class Schedule

In the past two weeks we have looked at the table of courses that teachers are teaching and students are enrolled in for a year. The five rules are:

  1. A teacher may not teach a class he/she is not qualified to teach.
  2. No two classes can be given in the same classroom in the same period.
  3. No teacher can be in two places at once, a teacher can only teach one class in one period.
  4. No student can be in two places at once, so no student can be in more than one class in the same period.
  5. A student cannot take the same class again after passing the class once.

Two weeks ago we looked at rule 1 and recognized a Cross Reference Validation Pattern, and last week we looked at rules 2 and 3 and saw a Limited Transactions pattern. This week we will look at rule 5 and see the Primary Key In Disguise, and next week we will wrap up this example by looking at how to do rule 4.

First Attempt: Restating Rule 5 In Terms of Tables

There are plenty of times when a customer will give you a requirement that sounds something like, "If X happens, Y may not happen." Rule 5 sounds like this kind of rule:

"If X Happens..."    | "...then Y may not happen"
                     |
If a Student passes  | the student may not
a course             | take the course again

The first thing we want to do is translate this into rules about entries in tables. Our first attempt might sound something like "An entry in the completed courses table prohibits an entry in this year's schedule." It might look like this:

HISTORY TABLE                                    THIS YEAR'S CLASSES

Student | Course                                 Student  |  Course        
--------+---------                              ----------+----------
Nirgal  | History       <-----------             Nirgal   |  Calculus
Jackie  | History     A "reverse" foreign        Nirgal   |  Geometry  
Nirgal  | Physics     key says entries in
Jackie  | Physics     history prevent entries
                      in current enrollment

The problem with this statement is that there is no way to enforce such an idea. It is not a foreign key, it is more like a reverse foreign key. A foreign key says a a parent row must exist but we are saying a parent row must not exist. If there were such a thing as a reverse foreign key we would be ready to move on to the next task. But since there is no such thing as a reverse foreign key, we must keep looking.

The Customer Does Not Design Tables

So far we have a formulation "An entry in the completed courses table prohibits an entry in this year's schedule." But here I have deliberately made a common mistake to demonstrate how programmers sometimes deal with the requirements given to us by end users. I have taken the rule and turned it into a statement about tables without first doing any real thinking. This is like translating a sentence word-for-word from one language to another. When you transliterate the user's requirements this way instead of translating, the result is exactly the same as for human languages: you get nonsense.

So we must now remember that the customer is not in the business of table design. The customer will explain their needs as best they can, but we should not expect the customers' statements to translate directly into table definitions.

So if we look at this rule again we realize that when one thing excludes another thing we may be looking at a primary key or unique constraint. If the student's completed courses and the student's current courses are stored in the same table then the problem is solved. We make a primary key or unique constraint on student + course and that is the end of that.

Relational Does Not Meet My Needs!

Nowadays (February 2008) it is pretty common to hear programmers who are relatively new to the database world say: "Relational just cannot do what I need, my customers have complex needs that don't fit into relational concepts." If that programmer is handling text, like books, or media files then they may be right. But if not, that programmer probably:

  • ...has been taking the customer's requests at face value instead of translating them into solid concepts.
  • ...is not aware of dead-end patterns like the "Reverse Foreign Key" and how to recognize the valid patterns that are hiding behind them.

Now that we know we probably have a primary key, and not some kind of weird reverse foreign key, it is time to design the table.

A Primary Key In Disguise

Putting them into the same table is not that big a deal. It is nothing more than a list of courses the student has taken or is currently taking. The table would look something like this:

The students_x_courses Table

  Primary Key (or perhaps an int primary
       |       key and this is a 
       |       unique constraint)
       |
  +---------+
  |         |
Student | Course   | Year  | Flag_hist |  Grade
--------+----------+-------+-----------+--------
 23     | HIST-302 | 2005  |  N        | 
 23     | PHYS-101 | 2003  |  Y        |  92     
 23     | CHEM-211 | 2004  |  Y        |  96       

Deletions Required

We should note that if a student fails a course, the approach we are taking requires that the row be deleted outright, otherwise they cannot take it again. All database programmers share a deep uneasiness about deleting data, it makes us nervous. There is good reason for this. As soon as you delete the failed classes somebody will ask for some statistics on how often students fail courses, which you cannot tell them because you deleted the data! Nobody wants to be having that conversation!

Therefore we should assume that we will have a separate table just to store a record of course failures. We will copy failure records to that table before we delete them.

Putting Current Data and Historical Data Together

There are plenty of strange ideas out there, and sometimes an idea from ages past will persist long after anybody remembers where the idea even came from. One such idea is that you should not mix together historical and current data.

This idea came from long ago when hard drives where extremely expensive, when many data files were stored on tapes or cartridges and never got near a hard drive. The idea was to separate your live and historical data because the bulk of your operations required one or the other but rarely both, and you could not afford to keep them both available at the same time. In really old fashioned batch operations something we take for granted like a simple query would be done by loading tape after tape onto a refrigerator-sized tape machine that executed some query program. If you kept history and live data together you might have to load 30 tapes, but if you kept them separate you would only have to load 3 tapes to pass the live data. That basic practice has stayed with us ever since in the form of prejudices and vague advice about "not mixing current and history."

But nowadays we do not need to worry too much about hard drives, there is no validity to most of the reasons for keeping this data separate.

Conclusion: Translation, Not Transliteration

This week we saw what happens when we take user requirements at face value. They lead us into dead-end design patterns, that is, patterns that are not valid and cannot be implemented. We saw how important it is to take the users' statements and seek out the "disguised" primary keys, foreign keys and so forth. This is the sad result of transliteration, the direct conversion of user statements into table designs.

The correct process is translation, looking at the user statements with a critical eye and seeking out the primary keys, foreign keys and table definitions that are lurking there. Those programmers who do not learn this skill will be led into the false belief that they are somehow dealing with problems that no human being has ever seen before, and that they need some post-relational or extra-relational or razmataz-relational system that will provide a unique, strange and clever solution to their troubles. The truth is usually much less interesting and not nearly so ego-inflating. The truth is that most data does fit into tables pretty neatly, and that most rules (but not all) can be expressed as unique constraints and foreign keys.

Next week we will wrap up this example by looking at rule 4, that no student can be in two places at once.

Next Week: The Framework and The Database

Sunday, January 27, 2008

Table Design Pattern: Limited Transaction

Welcome to the Database Programmer. This is a blog for anybody who wants to learn about databases. The entries are meant to be simple and easy to read but definitely not dumbed down. Because most of us these days work on web sites of one sort or another, and since all non-trivial websites require a database, there is very good reason to learn how databases really work.

There is a new entry every Monday morning, and the complete table of contents is here. We are currently looking at Table Design Patterns and how they lead to tight and efficient code.

The Example: A School Class Schedule

In this series we are using the example of an application that manages a school of a few dozen faculty and few hundred or perhaps a couple thousand students. Each year the school administration must make up the actual class assignments for each teacher, including what classroom and period each class will be taught in. Then students must be assigned into the classes.

There are five rules that must be followed:

  1. A teacher may not teach a class he/she is not qualified to teach.
  2. No two classes can be given in the same classroom in the same period.
  3. No teacher can be in two places at once, a teacher can only teach one class in one period.
  4. No student can be in two places at once, so no student can be in more than one class in the same period.
  5. A student cannot take the same class again after passing the class once.

Last week's entry showed that rule 1 was an example of the Cross Reference Validation Pattern, and this week we are going to see that rules 2 and 3 are an example of the Limited Transaction pattern. Next week we will look at rules 4 and 5, which deal with the student.

Sidebar: Discovered Requirements

This week the rules are interesting because they are the kind that nobody would ever actually tell you. I call such rules discovered requirements because they are usually discovered by a programmer or database designer while the table design or programming is under way.

These rules will not be in the specification because they are so obvious that the customer would not think to write them down. It is not that the customer considers putting them in and decides not to, the rules simply never come into the customer's mind. They do not tell you these requirements for the same reason they do not tell you that fish live in water and people breathe air.

Nevertheless, if you do not seek out these rules and put them into the application, then you will get a call or an email that something is wrong because a teacher has been put into two rooms at the same time. Make no mistake, the blame always falls on the programmer, because, after all, what idiot would write a program that would let a teacher be in two places at once? It is very very hard to look a customer in the eye and say, "Well somebody really should have told me about that requirement."

Later in this series we will see more about this in an entry I am planning called "The Requirements Will Never Be Correct", but for now we will move on to the actual database design.

Looking At The Table

The table is fairly easy to work out:

     Rule 2                                   Primary Key
         |                                         |
   +-----+-----+                                   |
   |           |                                   |
CLASSROOM |  PERIOD    | COURSE   | TEACHER  | SCHEDULE_ID
----------+------------+----------+----------+------------- 
  XXX     |   XXXX     |   XX     |  XXX     |   XXX    
  XXX     |   XXXX     |   XX     |  XXX     |   XXX      
  XXX     |   XXXX     |   XX     |  XXX     |   XXX      
  XXX     |   XXXX     |   XX     |  XXX     |   XXX      
----------+------------+----------+----------+-------------
                  |                   |                       
                  +--+----------------+            
                     |                                 
                   Rule 3: No teacher may be in    
                           two places at once      
                

The example above is a simple case of multiple unique constraints on a single table. The term "unique constraint" means that one or more columns must have unique values, just like a primary key. These are sometimes called "candidate keys" as well.

Identifying The Pattern: Limited Transactions

Th Limited Transaction Pattern occurs when there are limitations on what transactions are allowed. To see what I mean by a limitation, we will look at a counter-example, a transaction table that has no limitations. A shopping cart is a good example. We would never tell a customer that they may only have one order per day, or that a salepersons may enter only one order per day, or anything else along those lines. But the school example is the opposite, there are several limitations on what kind of transactions are allowed. Right now we are looking at the limitations that can be addressed with unique constraints.

The SQL

Here is the SQL that will create the table as it is depicted above:

CREATE TABLE schedule (
   classroom char(5)
  ,period    char(5)
  ,course    char(10)
  ,teacher   char(10)
  ,assign_id int IDENTITY
  -- First define the primary key 
  ,primary key (assign_id)
  -- Rules 2 and 3 use additional unique constraints:
  ,constraint unique rooms_xp    (period,classroom)
  ,constraint unique teachers_xp (period,teacher)
  -- Every column in this table is actually a foreign key!
  ,foreign key  (classroom) references classrooms (classroom)
  ,foreign key  (period)    references periods    (period)
  ,foreign key  (course)    references courses    (course)
  ,foreign key  (teacher)   references teachers   (teacher)
)

Final Recap Of The Pattern

So let's review one final time how we ended up with this table.

  1. We have a transaction table, because we have an interaction between master elements (teachers, courses, classrooms and periods), so we have several foreign keys.
  2. We used an integer primary key as per Rule of Thumb 4 for Transaction Tables.
  3. A classroom is limited to only one course per period, so we add a unique constraint to enforce that.
  4. A teacher is limited to only one course per period, so we add a unique constraint to enforce that.

The pattern we have is the result of a limited transaction table, a table listing transactions in which not every conceivable combination is actually allowed. In this case the limits come from the physical reality that you cannot be in two places at once.

Conclusion: Patterns Reduce Application Code

When I first began working with databases, I had no idea how many problems could be resolved into simple unique constraints and foreign keys. A big step that any code grinder takes towards becoming a real database programmer is realizing how many seemingly complicated and difficult tasks actually resolve down to unique constraints and foreign keys.

Every time you can put a constraint into a database then you reduce the complexity of your application code. If your framework can trap server errors and report them then you have a very simple way to enforce a lot of the business rules of your application.

Next week we are going to look at rules 4 and 5, which relate to rules about the student's enrollment in certain courses.

Next Essay: False Patterns and The Reverse Foreign Key