Thin Provisioning (Dynamic Provisioning, Virtual Provisioning, or whatever you prefer to call it) is being heavily touted as a method of reducing storage costs. Whilst at the outset it seems to provide some significant storage savings, it isn’t the answer for all our storage ills.
What is it?
Thin Provisioning (TP) is a way of reducing storage allocations by virtualising the storage LUN. Only the sectors of the LUN which have been written to are actually placed on physical disk. This has the benefit of reducing wastage, in instances where more storage is provisioned to a host than is actually needed. Look a the following figure. It shows five typical 10GB LUNs, allocated from an array. In a “normal” storage configuration, those LUNs would be allocated to a host and configured with a file system. Invariably, the file systems will never be run at 100% utilisation (just try it!) as this doesn’t work operationally and also because users typically order more storage than they actually require, for a many reasons. Typically, host volumes can be anywhere from 30-50% utilised and in an environment where the entire LUN is reserved out for the host, this results in a 50-70% wastage.
Now, contrast this to a Thin Provisioned model. Instead of dedicating the physical LUNs to a host, they now form a storage pool; only the data which has actually been written is stored onto disk. This has two benefits; either the storage pool can be allocated smaller than the theoretical capacity of the now virtual LUNs, or more LUNs can be created from the same size storage pool. Either way, the physical storage can be used much more efficiently and with much less waste.
There are some obvious negatives to the TP model. It is possible to over-provision LUNs and as data is written to them, exhaust the shared storage pool. This is Not A Good Thing and clearly requires additional management techniques to ensure this scenario doesn’t happen and sensible standards for layout and design to ensure a rogue host writing lots of data can’t impact other storage users.
The next problem with TP in this representation is the apparent concentration of risk and performance of many virtual LUNs to a smaller number of physical devices. In my example, the five LUNs have been stored on only three physical LUNs. This may represent a potential performance bottleneck and consequently vendors have catered for this in their implementations of TP. Rather than there being large chunks of storage provided from fixed volumes, TP is implemented using smaller blocks (or chunks) which are distributed across all disks in the pool. The third image visualises this method of allocation.
So each vendor’s implementation of TP uses a different block size. HDS use 42MB on the USP, EMC use 768KB on DMX, IBM allow a variable size from 32KB to 256KB on the SVC and 3Par use blocks of just 16KB. The reasons for this are many and varied and for legacy hardware are a reflection of the underlying hardware architecture.
Unfortunately, the file systems that are created on thin provisioned LUNs typically don’t have a matching block size structure. Windows NTFS for example, will use a maximum block size of only 4KB for large disks unless explicitly overriden by the user. The mismatch between the TP block size and the file system block size causes a major problem as data is created, amended and deleted over time on these systems. To understand why, we need to examine how file systems are created on disk.
The fourth graphic shows a snapshot from one of the logical drives in my desktop PC. This volume hasn’t been defragmented for nearly 6 months and consequently many of the files are fragmented and not stored on disk in contiguous blocks. Fragmentation is seen as a problem for physical disks as the head needs to move about frequently to retrieve fragmented files and that adds a delay to the read and write times to and from the device. In a SAN environment, fragmentation is less of an issue as the data is typically read and written through cache, negating most of the physical issues of moving disk heads. However fragmentation and thin provisioning don’t get along very well and here’s why.
The Problem of Fragmentation and TP
When files are first created on disk, they will occupy contiguous sections of space. If this data resides on TP LUNs, then a new block will be assigned to a virtual TP LUN as soon as a single filesystem block is created. For a Windows system using 4KB blocks on USP storage, this means 42MB each time. This isn’t a problem as the file continues to be expanded, however it is unlikely this file will end neatly on a 42MB boundary. As more files are created and deleted, each 42MB block will become partially populated with 4KB filesystem blocks, leaving “holes” in the filesystem which represent unused storage. Over time, a TP LUN will experience storage utilisation “creep” as new blocks are “touched” and therefore written onto physical disk. Even if data is deleted from an entire 42MB chunk, it won’t be released by the array as data is usually “logically deleted” by the operating system. De-fragmenting a volume makes the utilisation creep issue worse; it writes to unused space in order to consolidate files. Once written, these new areas of physical disk space are never reclaimed.
So what’s the solution?
Fixing the TP Problem
Making TP useful requires a feature that is already available in the USP arrays as Zero Page Reclaim and 3Par arrays as Thin Built In. When an entire “empty” TP chunk is detected, it is automatically released by the system (in HDS’s case at the touch of a button). So, for example as fat LUNs are migrated to thin LUNs, unused space can be released.
This feature doesn’t help however with traditional file systems that don’t overwrite deleted data with binary zeros. I’d suggest two possibilities to cure this problem:
- Secure Defrag. As defragmentation products re-allocate blocks, they should write binary zeros to the released space. Although this is time consuming, it would ensure deleted space could be reclaimed by the array.
- Freespace Consolidation. File system free space is usually tracked by maintaining a chain of freespace blocks. Some defragmentation tools can consolidate this chain. It would be an easy fix to simply write binary zeros over each block as it is consolidated up.
One alternative solution from Symantec is to use their Volume Manager software, which is now “Thin Aware”. I’m slightly skeptical about this as a solution as it places requirements on the operating system to deploy software or patches just to make storage operate efficiently. It takes me back to Iceberg and IXFP….
So in summary, Thin Provisioning can be a Good Thing, however over time, it will lose its shine. We need fixes that allow deleted blocks of data to be consolidated and returned to the storage array for re-use. Then TP will deliver on what it promises.
Incidentally, I’m surprised HDS haven’t made more noise about Zero Page Reclaim. It’s a TP feature that to my knowledge EMC haven’t got on DMX or V-Max.