Automatic Relational Lookup Fields (SQL Database Engine)

Some time back I posted the following on a discussion boardI frequented at the time, wishing for the describedfeature to bless the SQL Server database engine:

Automatic relational lookup fields. e.g. Someone has a tablewhere they store, for instance, a form name. They store thisnvarchar(64) (say averaging 30 characters, averaging 60 bytes arecord) over and over again, redundantly. In a table with millionsof records there are only a dozen variations of the form name, andthe end result is tremendous bloat of the data, indexes, and ofcourse lookups are that much slower as well because of the I/Orequirements. To some database designers this is a reasonabledesign because it is using a natural key (rather than anartificially generated autonumber), but it is extremely inefficientwith space.

I would love the ability to toggle a boolean switch on the columnand SQL Server will automatically setup a hidden lookup table;which it will automatically maintain based upon values inserted.Concievably it could also scale the relational value (e.g. tinyint,smallint, int, bigint) based upon the number of values in thelookup. Of course I normally do this myself, but when you walk intoa large system that exhibits this sort of issue pervasively, it’sdifficult to fix – you can hide the table behind a view, and usesome INSTEAD OF triggers to do the auto-mungification, however thatis a leaky abstraction, as Joel would say (e.g. INSERT FROM fails,Enterprise Manager doesn’t use it properly, and so on). It’s such abrainless, rudimentary task, it is one of those simple buteffective features that would be worthwhile and would improve SQLServer.

Back to your regularly scheduled program.

Most of the replies completely missed the point, going offon tangents about how the database should be normalized better (notalways an option when you have largescale, widely deployedenterprise systems. Not to mention that using large natural keyscan be completely normalized, but still benefitting from this sortof improvement). Others suggested that the database engine shouldbe as dumb as possible, doing nothing beyond storing tuples in thesimplest and most obvious manner.

This came back to mind seeing a whitepaper about IBM’s justreleased DB2v9 (“Viper”), offering optional lempel-ziv row-levelcompression, yielding many of the benefits that I mentioned above.This sort of compression, similar to the “compression”described above, would do wonders for a Sugar CRM database,for instance. Of course other implementations have somethingsimilar, for instance MySQL’s enum field type.