Transactional Filesystem in Vista

A classic, forever repeating quandary when designingapplications that store large numbers of data files (images, userfiles, supporting documents, etc) is whether to store the files inthe database, or to store them in the filesystem with pointers inthe database.

Consider a web application that tracks support tickets,where users can add supporting documents to their tickets. Inmany implementations the files are uploaded and stored in afilesystem location with a unique name or directory (often aGUID), and that unique filename is stored in the databasecorrelated with the record. This means that record accessnecessitates both file system access along with the databaseaccess.


There are significant disadvantages to this, including the lackof transactional integrity of the filesystem objects, thedifficulty of management (trying to coordinate file system anddatabase backups to be able to restore to a consistent state), thesecurity issues, the lack of relational integrity (files could bedeleted, records could be deleted without cleaning up the relatedfiles, and so on), among others.

On the flip side, the advantage of this technique is reducedload on the database server (e.g. you could offload file storage toa very large scale NASdevice), as well as immediate file system access whereappropriate (e.g. an administrator needs no special tools to browsethe files, although this could be considered a detriment as well).Many developers find it an easier model to implement using the filesystem for supporting files.

For those who prefer the file system in such scenarios, thetransactional integrity deficiency of this technique will befixed in Windows Vista (formerly Longhorn) and related technologies- It is introducing a transaction-capable variant of NTFS (TxF).NTFS is already a journaled and reliable filesystem, however TxF will add theability of the filesystem to participate in distributedtransactions – both intra-machine, and inter-machine – withstandard two-phase commit functionality. This means, for instance, thatwhen the user is adding the record that includes a supportingdocument, the file and record could be created under a sharedtransaction in the middle tier (or even in the database if you useit as a conduit, storing and retrieving filesystem objects in thedatabase logic), and if either fail they both fail(avoiding dirty data). Add this with the easy ability to addcomplex database logic to probe and validate the correlating filesystem (now that SQL Server 2005 can host .NET functionality,meaning that your trigger can more robustly check for fileexistance when records are created, and delete them when they areremoved), and it becomes a much more credible option.

(As an aside – Distributed transactions – transactionsacross heterogenous resource managers – have traditionally beenvery, very slow. This new file-system transactionfunctionality most certainly isn’t free, but where thereliability is critical – which is almost always given thecost and uselessness of dirty data – it can represent a greatimprovement. Registry changes will also be boundable in distributedtransactions)

Supporting Links
Channel 9Video on TxF (Given that the long form name is thecorrectly descriptive Transactional NTFS, shouldn’t theabbreviation be TxNTFS Longer to say, but it seems superior tome)

Tagged: [], [],[], []