This is a series of posts reviewing the Compellent Storage Center SAN Array. Previous Posts:
In this post we’ll discuss the logical configuration, connectivity and protocols available on the Compellent Storage Center array, including the way disks are grouped and LUNs are created from the underlying storage.
Where’s My LUN?
The first thing that should be noted as we dive into the detail on how the Compellent array stores data is that it does not operate like traditional storage arrays with disks in fixed RAID groups and LUN configurations, but uses the previously mentioned Dynamic Block Architecture. RAID 10 RAID-10 DM (Dual Mirror), RAID-5 and RAID-6 configurations are supported (including both RAID-5 with 5 or 9 drives in a RAID set and RAID-6 with 6 or 10 drives), however that’s where the similarity with traditional arrays ends. The underlying physical disks are actually simply grouped together to provide raw disk capacity and LUNs are configured from that storage. RAID is applied to each individual block of a LUN and this can change over the lifetime of that block of data. At the outset this may seem like a complicated design but in reality it isn’t. By breaking down a LUN to the block level and then applying protection and performance criteria, Compellent can achieve higher performance from a system using less cache and crucially less high performance drives. Let’s start at the basic disk level and work up to define how the Compellent system works.
Physical disks are classified by their rotational speeds, which effectively groups them by performance. In the review hardware, drives were classified as SLCSSD (for STEC SSD drives), 15K (the fibre channel drives) and 7K (the SAS drives). By default all disks are added to a single group (or folder) called “Assigned”. Disks that are not in use are assigned to a dummy folder called “Unassigned” from where they can be added to a new or existing disk folder. Compellent recommend keeping a single disk folder, as spares must be assigned within each group. Having multiple folders would both waste spares from a capacity perspective and reduce performance as I/O would be spread over less spindles. Of course it is possible to create separate groups if you wish.
As part of the disk folder definition, either single or dual parity must be specified for each tier. Screenshots 2 & 3 show the setup of the default “Assigned” group and a second “New Disk Folder 1″. There are two other things to say about disk folders. Firstly disks can be removed from a folder. This requires “evacuating the disk” which can be achieved by moving it to another dummy folder. If the disk contains no data, it can simply be removed from the folder. Second, as disks are added to a folder, there’s the risk of RAID imbalance, with all of the data existing on the disks already in the disk folder. Therefore as disks are added, the RAID configuration can be rebalanced to obtain optimum use of all spindles.
The concept of Storage Profiles is where the Compellent Storage Center “secret sauce” is to be found. These determine how the system writes data to disk for each LUN (or volume as they are known in Compellent terms) and how data ages over time – a feature called Data Progression. Let’s look first at the profiles.
For each volume/LUN in an array, the Storage Profile determines how data is written to disk. Storage profiles have two components, specification of where data should be written and specification on where Replay data should be located. It’s worth taking a moment to understand what Replays are as I’ve yet to mention them. Replays are essentially snapshots, used to return a volume to a previous point in time. By their nature, Replay data blocks are only ever used for reads as all writes made after a snapshot/replay is taken will be written to a new location in order to preserve the replay for a potential future restore. Replay blocks are therefore not part of the active write set of data being written to a LUN and don’t always need to reside on high performance storage; if they are being read frequently then they will reside in cache. Storage Profiles allow the administrator to indicate what should happen to both writable blocks and replay blocks for a volume. A high performance LUN could, for example, have its writable data on tier 1 storage with RAID-10 configuration and have replays on RAID-5 SAS. A medium performance volume could have writable data on tier 2 15K Fibre Channel and replay data on SATA.
The use of Storage Profiles provides some very important benefits in optimising the performance of the disks in a Storage Center array. They allow on a LUN basis, the exact performance criteria to be specified. In addition, only active data is retained on the highest performance storage with inactive data moved off to lower performing (and lower cost) devices. As RAID is established at the block level, this means a granularity on writes of 2MB in a standard configuration. If required, volumes can be created using 512KB blocks where writes are small.
Replays and Data Progression
Although I’ve touched on the subject of classifying data into active writes and replays, I haven’t explained the actual mechanism in which data moves between these groups. There are two methods by which data is migrated between tiers of storage; via Data Instant Replay and through Data Progression. Replays as we have discussed are Point-In-Time snapshots of volumes. When a replay is taken, all of the pages comprising a volume are frozen. Subsequent writes to the volume are made to new blocks on storage. This preserves the data at the point the Replay was taken and also quite helpfully allows the blocks that are being actively written to be distinguished from those which are inactive. The Replay blocks can then be moved to a lower tier of storage. Compellent recommend that every volume has a Replay taken on a daily basis.
Data Progression uses a similar technique to move blocks of data that are less frequently used, down to lower tiers of storage over time. Initially all writes are made to the highest tier of storage and over time, migrated to lower tiers based on frequency of access. This occurs at the block level and means Storage Center arrays can be configured with the optimal mix of different drive types. For instance, if more performance is required, SSD could be added; if more capacity is required, SATA can be added.
In summary, the Compellent design enables data placement to optimised to the block level, with less frequently accessed data moved to lower tiers of storage. In normal circumstances the default settings can be used but if specific high performance volumes are needed, this can be accommodated too.
One final topic, as this is becoming a long post. Storage Center supports both Fibre Channel and iSCSI connectivity. Unusually for storage arrays it allows both protocols to be mixed to a single volume at the same time; so I/O can be actively using both fibre channel and iSCSI. If you have the right version of switch firmware, Fibre Channel also supports NPIV, which enables the creation of virtual ports on physical ports. I hope to go over this in a future post.
In the next post I’ll discuss performance and some of the other miscellaneous features such as replication and clustering.