Article by George Grump Storage Analyst
Traditional storage systems ( SAN, NAS, DASD) leverage traditional RAID architectures to
maintain data availability which, unfortunately, are not particularly well
suited for PB-scale environments. This is especially true when configured with
the highest capacity hard drives possible to keep cost per-GB low. And even
with these high capacity drives, there is still the requirement to support
hundreds (or thousands) of spindles.
Unfortunately, hundreds of high capacity hard drives creates
the ‘perfect storm’ for RAID. With this many drives the cloud storage provider
has to work under the assumption that there will always be a hard drive failure
(the laws of probability). When there is a failure to a hard disk in a RAID
group, the RAID controller logic must rebuild the entire disk, sector by
sector, even if the drive is mostly empty. The larger the drive gets, the more
time it takes to perform this rebuild; with disk capacities of up to 4TBs this
can stretch into days.
Long rebuild times plus the likelihood of constant rebuild
efforts creates additional challenges for these large scale data centers.
First, the longer the rebuild the longer the period of time that the
applications or users connected to that array have to endure the degraded
performance caused by I/O intensive rebuild traffic. Second, and more
importantly, the longer the rebuild time the longer the data center is
vulnerable to complete data loss caused by a second or third drive failure
(depending on whether RAID 5 or RAID 6 is deployed).
These factors encourage storage designers to create highly
redundant RAID systems with powerful controllers. The problem is that doing so
leads to reduced hard drive efficiency, increased costs and poor space
utilization. Despite these extra steps the systems are still not completely
trusted and storage managers will often augment them with disk backup systems,
replication and tape. In the highly competitive cloud or service provider
market the combination of all of these problems has driven storage managers to
find new alternatives.
The Object Storage Solution
The first attraction of object storage is its ability to
deal with millions upon millions of objects or files. Object based storage
allows providers to overcome a key weakness of traditional file systems which
can have file count restrictions due to their limitations with handling
metadata. These metadata problems often force large infrastructures to add
storage systems prior to using all the capacity available with current systems.
Object storage solves this problem with its ability to support a nearly
unlimited number of objects.
When it comes to data protection, object storage systems
also have the advantage of being granular to the object or file level. This
means that a user can control the number of copies of data that are made on a
per-object (or group of objects) basis. This is typically done via a
replication policy that simply copies objects to other storage nodes in the
environment.
With a dispersed storage system, if a disk fails, the system
uses another copy of the object stored on a separate node/disk. It then
replicates this data to another device in the system to bring the object to
full reliability. Performance is also improved because dispersed storage
requires no complex XOR function, like RAID has, in order to identify lost
files.
The Object Storage Challenge
While a replication based strategy enables simple protection
with rapid recovery it does present several challenges to the storage designer.
First, this type of “1: X” protection scheme, with X being the desired number
of redundant copies, is very capacity inefficient. Assuming that a data center
would like to keep three copies so it can still access data even if two nodes
have failed, their storage capacity requirements would triple. Further levels
of protection only exacerbate the problem.
The second challenge is performance. Since each object copy
is uniquely contained on a storage node, performance is limited to the
capabilities of that node. In other words per-object performance will not scale
as more nodes are added to the environment.
The Dispersed Storage Solution
To solve these problems companies like Cleversafe have
leveraged data erasure coding and dispersal algorithms that parse data into
multiple segments and then distribute these segments across multiple nodes in
the storage cluster. In the case of Cleversafe these nodes can be
self-contained, all in one building or data center, or can be geographically
distributed.
At a high level erasure coding is similar to RAID except
that the parity calculations are applied at the file (or object) level instead
of at the disk level. When a rebuild is necessary, the segments that compose a
file can be reproduced much more easily than having to rebuild an entire disk
drive.
This level of protection can be dialed up or down by
increasing the delta between the number of segments generated and the minimum
number of segments required to reconstruct the data, which can be based on
attribute policies such as age, data type, popularity, etc. The result is a
level of protection or an effective redundancy much higher than even the
two-drive-failure protection provided by RAID 6, with far less capacity
overhead.
Also, multiple storage nodes can deliver their data segments
in parallel, which helps with read performance. And, the system has the
intelligence to provide data from the closest or fastest set of nodes.
The Dispersed Storage Advantage
Another significant advantage with dispersed storage is that
the storage manager is now managing just one copy of data that is fully
protected. The algorithm automatically makes sure that enough segments are
dispersed both locally and remotely to provide the level of protection it’s
configured for.
Compare this to legacy storage where data has to be stored
with especially inefficient RAID algorithms, like RAID 10, copied to a
secondary local disk backup system, then replicated via a separate process to a
remote facility and finally copied to tape for the "restore of last
resort" copy.
Dispersed storage simplifies management and data protection
all while reducing cost, space, power and cooling requirements. In the
cost-competitive provider market, this is exactly what’s required to meet
demands, while enabling providers to continue on-boarding new customers without
escalating costs.