JBOD vs. RAID vs. Erasure Coding: Which is Best for the Data Center?

Storage professionals face tough decisions when it comes to optimizing the storage architecture to meet fault tolerance, performance and scalability requirements as cost effectively as possible. At a basic level, these decisions largely come down to using one of two configurations: “just a bunch of disks” (JBOD, which is increasingly becoming JBOF – “just a bunch of flash”) or “redundant array of inexpensive (or “independent”) disks” (RAID). New into this equation comes Erasure Coding, the data protection parity technique associated with object storage. This blog will level set on the difference between these two configurations, and the value JBOD vs. RAID vs. Erasure Coding stand to bring to the modern workload set.

“Just a Bunch/Box of Disks” – JBOD Explained

In a JBOD configuration, the host CPU accesses storage drives individually, or in logical volumes of drives that have been combined in a linear fashion through a process called “spanning” (with the latter enabling additional drives to be added without the system being reformatted). Because each storage drive or logical volume is treated as an independent resource, JBOD configurations are typically easy to scale. This nature also adds flexibility by allowing various drive types and capacities to be mixed. Because drives are not reserved for redundancy, all usable disk capacity may be utilized by the host CPU for application processing. Furthermore, because data is confined to a specific drive rather than being striped, or spread out across multiple drives, the impact of a restore in terms of performance and potential data loss is confined to that specific drive.

These value propositions being acknowledged, JBOD configurations intrinsically carry some risks and limitations. They are not fault tolerant; if data is erased from a drive or if a drive fails, the data is lost forever if it is not backed up or if a duplicate copy of the data does not exist. Additionally, read and write performance is limited because the host processor must access each disk or logical volume successively.

What happens to my data when a disk fails?

Enter RAID = “Redundant Array of Independent Disks”

History[1]: The term “RAID” was invented by David Patterson, et al at the University of California, Berkeley in 1987. In their June 1988 paper “A Case for Redundant Arrays of Inexpensive Disks (RAID)”, presented at the SIGMOD conference, they argued that the top performing mainframe disk drives of the time could be beaten on performance by an array of the inexpensive drives that had been developed for the growing personal computer market. Although failures would rise in proportion to the number of drives, by configuring for redundancy, the reliability of an array could far exceed that of any large single drive.

RAID configurations emerged to address fault tolerance and performance limitations inherent in JBOD configurations. RAID is a data storage virtualization technology that combines multiple physical disk drives into one or more logical units for data redundancy. RAID distributes data across the drives in different ways called RAID levels. The RAID levels depend on the level of performance and redundancy required. RAID configurations apply data striping (a technique of logically segmenting sequential data so that consecutive segments are stored on different physical storage devices) and disk mirroring (the process of exactly duplicating data between two drives). A controller then presents the RAID array as a singular logical unit to the host processor. This approach facilitates data redundancy and, by allowing for input/output (I/O) operation requests from the host processor to access multiple drives concurrently, application performance is accelerated.

RAID Configuration Tradeoffs

There are multiple RAID configurations, spanning from RAID 0 (which offers no redundancy but fastest levels of performance) through RAID 6 and above, for varying levels of data redundancy and performance acceleration. You can see a more in-depth explanation here, but to summarize:

  • The higher the RAID level, the more disk failures that can be tolerated and the data still recovered. For example, RAID 6 enables two disks out of a 4-disk grouping to fail and the data is recoverable
  • The higher the RAID level, the less efficient use of disk capacity for storing data vs. storing information to recover data. In RAID 0, essentially 100% of the formatted capacity of the disk is available for data storage. In RAID 6, only ~60-70% of the formatted capacity of the disk is available for data storage.
  • Rebuild Times: When RAID was invented, typical disk capacities were ~100MB (remember this is 1987…). A failed disk easily could rebuild in under an hour. In today’s environment, a failed 14TB disk could take as long as several days to rebuild – impacting system performance, and potentially putting data at risk in this timeframe.

The fault tolerance and faster performance facilitated by RAID come at the cost of greater management complexity. Additionally, in the event of a drive failure, many RAID configurations confine application performance to that of a singular, “hot spare” standby drive.

JBODs in a RAID?

JBODs – with the addition of specialized hardware (RAID controller cards) or software (RAID implemented in software) – can be used in a RAID architecture. There are no technical limitations to prevent a group of disks from be configured, managed and formatting to support the various RAID models.

Erasure Coding – the Next Generation RAID in the Data Center?

Another approach often used in multi-petabyte, object storage cloud environments is erasure coding. As my colleague Mike McWhorter recently explained: Erasure coding is a parity protection technique. In an erasure coded volume, files are divided into shards, with each shard being placed on a different disk.  To protect you from disk failures, additional shards are added, which contain error correction information.  Only a subset of the shards and information are needed to retrieve a file, which means it can survive multiple disk failures without the risk of data loss.

The main advantage of this method is that it requires much smaller disk space to protect data than replication techniques, and no required rebuild times. The downside of erasure coding is that parity calculation can be quite CPU intensive, so it can impact performance and latency and is not fitting for small-block I/O and performance-sensitive workloads.

Object storage software can be found as open source code and placed on a JBOD/JBOF for a software defined storage solution or as an integrated object storage system.

JBOD vs. RAID vs. Erasure Coding – Conclusion

RAID configurations are a highly mature, well understood technology to ensure high availability of data with predictable performance. Yet as data centers scale to petabyte, exabytes and beyond, RAID is seeing challenges to scale for big data demands.

For this reason, JBOD configurations and object storage are becoming a key staples of data centers as data grows in volume and becomes more distributed globally. JBODs are also becoming more popular as modern file systems and software-defined storage solutions are becoming far more mature and are able to provide data protection capabilities. Supporting technologies to ensure data availability in the event of an individual disk failure – such as erasure coding and other backup technologies, are becoming more sophisticated and alleviating some pain points typically that have been associated with JBOD configurations.

Whether it’s JBOD, RAID and/or Erasure Coding, the right solution for your data center will depend on your workload demands and your IT management capabilities.

Learn More

 

[1] Source: Wikipedia.org

Dave is Director of Storage Platform Marketing with 20+ years of experience in the enterprise storage, computing, and software business.