Low-Cost Hybrid Tiering with InfiniFlash System for Big Data Flash
The new economics of all-flash arrays such as the InfiniFlash™ System from SanDisk® are fundamentally changing data tiering models, making them considerably more attractive and compelling for a range of applications with demanding I/O and storage requirements. As object storage models are becoming more popular, applications such as video surveillance and near real-time analytics are pushing the limitations of traditional spinning media. These applications need to perform fast ingest of new data, while also providing fast reads on recent data for analysis—all while supporting very large capacities for infrequent access to older data. Storage cost models for acceptable solutions must fit business needs for high capacity and low cost per gigabyte.
Tiering and Hybrid Storage Models
Object stores such as Ceph are well suited for data tiering—but tiering generally has been underutilized in the past. Most organizations have not taken advantage of data tiering capabilities due to the relatively small performance differences between different classes of spinning hard disk drives (HDDs) in terms of I/O operations per second (IOPS):
- High-capacity 7.2K RPM drives provide approximately 120 IOPS
- Medium-capacity 10K RPM drives provide approximately 250 IOPS
- Low-capacity 15K RPM drives provide approximately 300 IOPS
Subscribe to Blog via Email
Given these small differences, and the high cost and low capacity of 10K and 15K RPM HDDs, it has been commonplace to simply add sufficient 7.2K RPM drives to achieve the desired IOPS. For example, two 7.2K RPM drives combine to provide almost as many IOPS as a single 10K RPM drive. Regrettably, this strategy often results in significantly more deployed capacity than is strictly required, and drives power, cooling, and maintenance costs higher since there are more drives to power and cool, and more drives to fail, given the relatively high failure rates of HDDs.
Now, however, the dramatic IOPS performance improvements available with flash are changing this landscape. Specifically, tiered hybrid flash and HDD solutions can combine the high performance and very low latency of flash technology with the capacity and economic advantages of slower HDDs. For example, Ceph tiering provided on the SanDisk InfiniFlash IF500 can result in high-performance storage with the low cost per gigabyte that the business environment demands.
Tiering is most effective when there are relatively large differences in performance between the technologies used for the separate tiers. For example, each of the sixty-four 8TB InfiniFlash cards in the InfiniFlash IF500 all-flash array provides over 12K IOPS—roughly forty times that of the fastest available 15K RPM 600GB HDD, with substantially higher capacity. When combined into a SanDisk InfiniFlash IF500 array, these devices provide aggregate throughput of 7GB/s and 780K IOPS. Flash technology also provides sub millisecond latency, as compared to multiple milliseconds for HDDs.
Object Store Tiering – Keeping Costs Down
This large performance differential is ideal for Ceph object tiering capabilities, allowing the creation of truly interesting hybrid object stores for near real-time analytics and other applications.
Combining the ultra-dense SanDisk InfiniFlash (delivering 512TB in just 3U) with low-cost HDDs provides both high performance and high capacity, with very low per-gigabyte costs in a scale-out model. This model allows for the utilization of even lower-performance, higher-capacity, and lower-cost 5.9K RPM disk drives as a back-end tier, combining to result in a very low cost per gigabyte. Importantly, one can easily take advantage of enterprise-class data management features such as snapshots and replication that result in zero data loss, which is highly attractive to many applications.
Subscribe to Blog via Email
Optimized Object Store
Object-store available in public clouds is often very slow – both the pipes to/from the cloud and the storage model itself. So having object storage on-premise using flash can deliver huge benefits as flash can stream even faster than even data 3-4 copies placed on HDDs.
In this illustration, the primary data is running on object store served from flash. That means data activity is taking place solely on the InfiniFlash system, enabling customers to take advantage extremely low-cost SMR drives for replication to ensure data protection should any failure occur on the flash end, and to keep costs very low.
Infrastructure Design Based On Access Patterns
Hybrid tiered storage architectures allow infrastructure design based on access patterns rather than time. Operationally, new data and most-frequently-accessed data remain on the InfiniFlash arrays. Less frequently accessed data migrates to the lower-performance, higher-capacity tier. Occasional references to older data on the backend tier are easily served by the infrequently-accessed HDDs. In other words, this approach allows technology to be deployed according to the needs of the application, and where it can provide the most value for the business.
Big Data Flash is helping organizations take advantage of large-scale Object Storage based applications, delivering capacity and performance at unprecedented scale and new economics. In a few weeks I will be speaking about how to identify common challenges in Hadoop deployments and the Flash-based solution architectures to help avoid or alleviate these challenges in a free webinar:
Flash-based Solution Architectures for Hadoop Deployments
Aug 13 2015 8:00 a.m. PDT
Feel free to reach out to me with any questions in the comments below, or at Roark.Hilomen@sandiskoneblog.wpengine.com.