Things can, and do, go wrong. It’s a fact of life, and businesses spend time and money preparing for unexpected hiccups. The data storage industry has spent decades enhancing the reliability of storage architectures by assuming their components will fail at some point. All elements in the chain—cables, power supplies, cooling, drives, software (and even the sys admins) —can possibly fail, without forewarning, and disrupt access to users’ data.
Reliability has often been equated with data accessibility, i.e. ensuring data access at a given SLA. A more contemporary perspective sees this differently and with two key, separate, measurements: availability and durability.
Let’s explore some of the differences and how they matter for your business.
Data Availability vs. Durability – They’re Not The Same Thing
Availability and durability are two very different aspects of data accessibility. Availability refers to system uptime, i.e. the storage system is operational and can deliver data upon request. Historically, this has been achieved through hardware redundancy so that if any component fails, access to data will prevail. Durability, on the other hand, refers to long-term data protection, i.e. the stored data does not suffer from bit rot, degradation or other corruption. Rather than focusing on hardware redundancy, it is concerned with data redundancy so that data is never lost or compromised.
Availability and durability serve different objectives. For data centers, availability/uptime is a key metric for operations as any minute of downtime is costly. The measurement focuses on storage system availability. But what happens when a component, system or even the data center goes down? Will your data be intact when the fault is corrected?
This illustrates the equal importance of data durability. When an availability fault is corrected, it is essential that access to uncorrupted data is restored. With the explosion of data created, the potential of mining, and growing needs for longer retention rates (for everything) you can imagine how this is paramount for business success.
Consider the potential competitive, financial or even legal impact of not being able to retrieve the archived master/reference copy of data. Hence, both data availability and data durability are essential for short- and long-term business success.
Ensuring Data Availability – RAID or Rateless Erasure Coding?
A common approach to ensuring data availability has been through RAID-based architectures. Striping data across multiple drives can protect against the failure of one or two drives, but performance can fall dramatically during rebuild operations, which can have negative impacts on business operations. Years of data center experience shows that drive failures are usually not isolated incidents: when one drive in a RAID group fails, the likelihood of other group member failing increases. An Unrecoverable Read Error during a rebuild operation means data is now permanently lost, which places your business at risk.
As drive capacities have greatly increased, so too have rebuild times. What formerly took minutes can now take hours, or even days. In addition, this requires replacement of the failed drive ASAP, be it weekends, holidays or the middle of the night.
Object storage achieves data availability through advanced erasure coding whereby data is combined with parity information and then sharded and distributed across the storage pool. Since only a subset of the shards are needed to rehydrate the data, there is no rebuild time or degraded performance, and failed storage components can be replaced when convenient.
Data Durability – RAID Alone Doesn’t Deliver
As you have probably surmised, achieving data availability is not quite the same as having access to the data that was originally stored. A media failure such as bit rot, where a portion of the drive surface or other media becomes unreadable, corrupts data thus making it impossible to retrieve the data in its original unaltered form. Simply protecting against a complete hard drive failure such as with RAID, does not protect against the gradual failure of the bits stored on magnetic media.
The combination of widely distributed erasure coded data (say with an 18/8 coding policy) and data scrubbing technology that continuously validates the data written on the media can enable you to achieve 15 nines of data durability. In simpler terms: for every 1,000 trillion objects, only one would not be readable. How’s that for data durability? It’s not surprising that hyperscale data centers and cloud services providers use object-based storage to meet the needs for the highest data availability and data durability.
Data Availability vs. Durability – A False Choice
Every organization needs its data to be accessible when requested and without corruption or other loss. It’s not a choice between high availability and high durability; both are essential. When operating @Scale this becomes even more apparent.
Just remember, it’s your data that needs protection, not your disk drives.
Robin Harris, president and chief analyst of StorageMojo, recently published a white paper discussing data availability vs. durability in archive solutions. If you are a data center planner, architect or an IT professional looking to understand the importance of both I encourage you to read his perspective on availability vs. durability in archive solutions.