SAN, NAS, and iSCSI (& SQL Server)

A bit of an odd entry today, spurred by the general lack ofawareness regarding these acronyms and their meaning: While they’reundoubtedly old-hat for a database admin in a large enterprise(where they certainly play a critical role), they’re less likely tobe found amidst the parlance of smaller shops, or amongprofessionals who function as hybrid developer/databasearchitect/system architect. As many of the people who visit fall inthose latter two categories, I thought it worth a quick overview.Even if you’re a developer and these are all deployment concerns,you should know what the network guys are talking about when theydiscuss these concepts.

All three acronyms exist in the world of segregated storagesystems, which in a nutshell is the requisitioning of storagecapacity separate from computational requirements: Instead ofcalculating each server’s needs as an island, as a composite ofcomputational and storage needs, you instead pool the storagerequirements and facilitate that via one of these storagetechnologies. Pooling brings some advantages of scale, sometechnological advantages, not to mention that your capacityutilization and peak performance will likely improve.

Segregated storage often allows for much greater scalability,allowing you to add disks and upgrades to the storage systems,transparently improving capacity and performance throughout theentire infrastructure (versus each system being an isolatedperformance unit).

So a brief overview of what each of the acronyms means, and howit applies.

NAS – Network Attached Storage

naslayout

Network attached storage is generally used to describe servers(or “appliances”) that are requisitioned for the sole purpose ofbeing file servers, often running a lightweight or specializedNAS-specific operating system (for instance Windows Storage Server 2003, or a specialized version ofLinux). NAS systems with massive capacities, often with redundancysuch as RAID (Redundant Array of Inexpensive Disks – basically thesystem has redundancy such that one or more of the hard drivescan completely fail with no loss of data, frequently exhibitingjust a decrease in performance, but with no downtime: Usually youcan plug a replacement drive in – while the system is running – andit’ll automatally bring the new drive online and populated,restoring performance. I’m ignoring the misnomer “RAID 0”, which isactually a performance technique that offers no redundancy), can bepurchased for incredibly low prices these days, many of them -including those built on Windows Storage Server 2003 – with noadditional licensing fees (e.g. you can add a huge-capacity NASdevice to facilitate your entire enterprise with only the cost ofthe box itself – no additional per-user licensing issues).

NAS systems generally support common file sharing protocols likeCIFS/SMB (Windows), NFS (Unix/Linux), and so on, and usuallyintegrate into Windows domains and Active Directory infrastructuresfor security purposes, so they seamlessly interoperate with yourexisting infrastructure. NAS is even making inroads in the home,with many alpha-geeks installing a very high capacity,high-performance NAS box for media files and centralized storage,supporting various other computing devices throughout thehouse.

Some NASresources:
WindowsStorage Server vendors
Aninexpensive, high performance NAS starting point -(the samecompany makes a highly laudedsolution for the home. ~$1000 for 1TB. Here’sa good entry about that product)
Iomega NAS servers
Dell PowerVault 754N
SQLServer 2000 I/O Configuration in a SAN/NAS Environment
WikipediaNAS entry

Apart from being a destination for backups, NAS can also hostSQL Server databases themselves (e.g. your database server isrunning on server A, but the actual data is on server B, managed byserver A over your high speed network), and with certified hardware(WHQL) this configuration is supported by Microsoft. To do so youjust need to create or restore the database to a UNC location.

e.g.

CREATE DATABASE SampleUNCDatabase ON
( NAME = Sample_dat,
FILENAME = '\\\\mynas\\db\\sample.mdf',
SIZE = 10MB,
MAXSIZE = 2000MB,
FILEGROWTH = 10MB)
LOG ON
( NAME = Sample_log,
FILENAME = '\\\\mynas\\db\\sample.ldf',
SIZE = 5MB,
MAXSIZE = 25MB,
FILEGROWTH = 5MB)

Those of you playing along at home will have been surprized bythe following error.

Msg 5110, Level 16, State 2, Line 1
The file \\\\mynas\\db\\sample.mdf is on a network path that is notsupported for database files.
Msg 1802, Level 16, State 1, Line 1
CREATE DATABASE failed. Some file names listed could not becreated. Check related errors.

By default SQL Server doesn’t support hosting databases onnetwork locations, as there are some caveats that need to beconsidered (namely the throughput – NAS is accessed via ageneralized file sharing protocol on top of a generalized transportprotocol, often over a lower speed transport, and can kill databaseperformance). You can enable UNC hosting by enabling trace flag1807. Just make sure your NAS is accessed over a dedicated orlow-usage Gbps or better network connection.

e.g.

DBCC TRACEON(1807)

CREATE DATABASE…

Success

You can read more about this at http://support.microsoft.com/default.aspx?scid=304261.This configuration is supported with appropriate hardware (whichgenerally means “running against an NAS that runs Windows 2003Storage Server”)

NAS can not be used in SQL Server clustering scenarios. For thatyou need to look at a traditional or iSCSI SAN.

As NAS is operating at a higher level, hiding the detailsof the underlying storage, to defragment an NAS device you wouldhave to do it on the device itself, specific to that NAS. You coulddo backups on the NAS itself, though in-use files like SQL Server’sdata files would need agents to be backed up online.

SAN – Storage Area Network

SANNetwork

While a NAS operates at a higher, more abstract level (the fileshare level, agnostic to the underlying file technologies andhiding the actual storage topology), in contrast a SAN functions ata much lower level.

SANs operate at the virtual-disk access level, using block and”physical locations” to define what to read and write, with theclient devices taking a more direct role in the “layout” (at leastas far as the client is concerned) of the data: Client systems areallocated blocks of SAN storage – which usually appear as abonafide drive on the client system (with appropriate drivers) -and are connected via a dedicated 1 to 4Gbps fibre network.Generally only one client can access a logical device on a SAN at atime, however with SQL Server clustering you can point severaldatabase servers at the same logical device, and if one fails theother one takes over the device (though it is still onlyone at a time). The protocol on the SAN fibre network isusually SCSI.

SANs are generally very expensive, and are usually the domain ofvery large enterprises. As SANs operate at a much lower level,basically operating as a dumb bank of bits and blocks, thesedevices can become fragmented, though defragmentation would have tooperate at the logical disk level, and generally needs to beperformed by the PC that “owns” that logical disk. As SANs appearto the operating system as a disk – just as if it were an internaldrive directly connected to the client – there are no limitationson its use beyond those that exist for a local drive.

Many SANs have a value-add in the form of snapshotfunctionality, where they can take an image of a logical drive andstore it somewhere else (perhaps as an online whole-volume backup).While this seems trivial, they can usually do it while the volumeis online and being written to, via a transaction log sort ofarchitecture. This can be very valuable in many scenarios.

Some SAN resources:
Wikipedia entry onSANs
Windows SAN Integration Technologies

iSCSI – internet SCSI (Small Computer System Interface)

iSCSINetwork

iSCSI is basically the SCSI disk control protocol over IP(internet protocol). The benefit being that you can access astorage device over anything that can relay IP, including ethernet,wireless, or even the public internet. Much like a SAN, iSCSI is adumb-bag-of-bits, and the client that owns a block of data isresponsible for its management.

iSCSI has two real roles of interest: Thetarget (the dumb-bag-of-bits that’s listening andresponding to iSCSI requests), and the initiator(the client computer, on which the virtual drive has been mounted).Initiators exist for virtually all modern operating system, andthere are even targets for many operating system to allow them tooperate as bags-of-bits (if you had a general purpose server with ahuge array of under-utilized hard drives, and adequate networkbandwidth, you could block some of that data to act as a storagedrive for another server). Alternately there are dedicated networkapplications that act as iSCSI targets.

iSCSI is appearing in some inexpensive forms, and most iSCSIsolutions fall pricewise somewhere between NAS and SANs. Like SANs,many iSCSI solutions have snapshot functionality. Also like SANs,iSCSI storage networks can be used for Windows clustering solutions(asof a service pack to come in early 2006) – for instance in SQLServer clustering.

Some iSCSI resources
Wikipedia entry oniSCSI
Microsoft iSCSI support (including initiator)
WindowsiSCSI target
FreeLinux iSCSI target

Summary

While this wasn’t intended as a complete guide to thesetechnologies, hopefully it has given enough of an overview thatthere is an appreciation of what they are, and how they might fitin most enterprises.