Normalization I

I’ve wanted to write a somewhat seminal treatise on my views on normalization for quite a while now, and have so far failed to start one because, well, it seemed that there was too much to say on the subject for just one post.

Faugh on that. It’s been bubbling in my mind for far too long, and frankly, with me grokking WCF and WPF aside from fixing bugs for 9 hours a day, I need the room in my head. So here goes.

Ground rules:

  • If you feel the need to name an entity or table anything along the lines of “Entity”, or “Object”, or “Property”, you are going too far. Take a deep breath. You are building a data model, modeling a reality. You are not modeling a model that can model the world. Such a model is called an RDBMS, and it is the system you are using to do this modeling.
  • If you feel the need to append a number to any table, database or field, you are not going far enough. A number signals duplication, duplication signals that you are modeling a list within a single table, and that signals that you need to create another entity.
  • Unless forced to, every entity should be an actual “thing”. If you cannot describe to the layperson what is stored in a table in under two minutes, you might be going off the deep end.
  • The model makes the rules. One way or the other (be it through non-nullable fields, check constraints, unique constraints, triggers or flying Oreo monitoring cookies), the data model makes sure that there is no way to fart on reality.

For the record, let me throw in my rules for database nomenclature for completeness:

  • No type prefixes, ever.  No tblWidget, Widget.
  • Table names that describe an entity are the generic, shortest term for that entity. Singular. No SalesPeople, no People, Person.
  • Many-to-many relationship tables are named for their component parts.
  • Every table has one primary key. A single field, named for the table with ID appended. Compound keys do not exist. If you think they do, please move along.

Now, these are neat rules, you say. How would you actually go about modeling something real-world?

Well, let’s tackle the rules on a real-world problem, shall we? I will update the above rules with new things that strike me as we go along.

Let us tackle what WinFS proposed: generic contact management.

Very well, let’s begin. First question: our base tables. Do we start with a Contact table? Sounds good, nice and generic, does it not? Sure, but… what is a Contact? I don’t know about you, but I don’t know any Contacts. What I know is businesses and people. Which, by the rules I stated above, gets reduced to Company and Person. So, here we go.

Company
    CompanyID
    Name
    …

Person
    PersonID
    Name
    …

Yikes. Road bump #1. I have written extensively before on how bad just modeling a name can get. Well, let’s punt that puppy for now. It is a sub-problem that we can tackle later. For now, allow me to say “Name” where I mean “the 10 fields that comprise a full name” for now. Incidentally, the temptation for “Contact” should be curbed once you consider the non-communal properties between the two. Call me pedantic, but it’ll be a cold day in hell before I put a BirthDate field in a Company table. Or, for that matter, in a Contact table and have it be blank half the time. It forces enforcement out to the business layer, and that’s a no-no. Hence the above “the model makes the rules” rule. The business layer enforces business rules, the model enforces reality. The differences can be subtle at times, and only experience gives a good feel. Still, here’s the rule of thumb: the model should enforce that anything retrieved is something that can actually happen.

Here are some considerations:

  • People do not have a “business address”. They work for a Company, which has an address. It seems we might want to model a relationship there.
  • There are different addresses for both types. Physical, mailing, booty call — there are simply differing types for differing purposes for
  • How do you model “work phone”? It doesn’t really belong to the person. It doesn’t belong to the company. It belongs to the relationship between a person and a company. You could call such a relationship “CompanyPerson”, “PersonCompany”, or if you want to be a bit more a propos on the matter, “Employment”. Still, no matter what the actual name, that’s where it rightfully goes.
  • Things can get seriously silly, especially if you want to keep history. For example, let me pick the most simple, the most basic, the most immutable of all properties a person can have: gender. Now consider sex-change surgery. About the most immutable property, it turns out, is date of birth.

Feedback is welcome. I am serious about modeling this, and modeling it for keeps.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: