Many enterprises are adopting a flash to cloud architecture to achieve a fast and cost-efficient IT environment. Here’s what this architecture looks like and what applications benefit most from this approach.

Data center infrastructure has become a poly-universe that serves multiple users, applications, and workloads, while also serving as the home to a variety of datasets and services. With those new demands come new cost and deployment challenges that inevitably drive adoption of faster and more scalable storage solutions.

The speed of adoption is heavily influenced by cost.

With building blocks of how we consume storage now more diverse, the challenge lies in how we implement and unlock those capabilities for current and future workloads. As with every new media technology cycle, the speed of adoption is heavily influenced by cost, but even more by the adoption of the application ecosystem that goes with it. The enterprise is witnessing, first-hand, the combined value of NVMe™ and cloud technologies; they are adopting both seamlessly into their workflows and some are focusing their environment solely on a flash to cloud approach.

On One Side, NVMe. On the Other, Cloud.

Over the last couple of years, flash has proven its value in delivering higher throughput, lower latency, and more efficient power usage. Moreover, the cost of flash technology has dropped while density and durability is increasing through multiple generations of SLC, MLC, TLC and, now, QLC.  As a result, organizations have begun to widely adopt flash across the data center and evaluate how it was consumed in the infrastructure.

One important result of the evolution of flash technology is that a protocol interface called NVMe (Non-Volatile Memory Express) emerged as a more efficient way to architect and deliver higher performance characteristics than the traditional SATA and SAS interfaces (check out our NVMe guide of resources if you are not familiar with this technology or its adoption).

Cloud technologies, on the other hand, have crossed the adoption chasm a few years ago, and they revolutionized how IT architecture is approached. While public cloud is used in some aspect by most organizations today, there is also a trend of repatriation[1] for some workloads and most companies opt for a hybrid architecture taking advantage of both public and private cloud models. (You can read this overview of what workloads fit in the public cloud vs. private cloud configuration.)

Applications Are Leading the Way – From Flash to Cloud

In the data center, two specific workflows stand out in taking maximum advantage of the NVMe and cloud storage combo:

1. Data Protection

Snapshot technologies have long been a standard to deliver a faster recovery time objective on file and block arrays. With more applications running in virtual environments, backup software vendors now have direct API integrations with flash-arrays to leverage these fast snaps. NVMe enabled arrays increase the speed at which these snaps are taken and reduce the impact of an application stun.

Furthermore, NVMe solutions, such as the IntelliFlash™ NVMe series (N-series), deliver 2x more IOPs per CPU, reducing the software licensing cost of your database and virtualization stacks. This combined with built-in deduplication and compression techniques, can reduce your overall data footprint up to 80%.

Although snaps are good for short term retention, they don’t scale for long-term and petabyte-scale environments. To drive down cost, backup vendors have widely adopted cloud targets as repositories for long-term retention.

We’ve teamed up with vendors like Veeam and Rubrik™ to support on-premises cloud storage and seamlessly store backup images and long term data on our petascale object-based ActiveScale™ system. ActiveScale provides up to 17×9’s of durability at 70% lower cost than traditional file servers and removes the need to store multiple copies with its 3GEO protection. If that’s not enough, customers can still choose to store a copy in the public cloud.

2. Data Analytics, ML & AI

Those in automotive, EDA, and research are no strangers to the real-time data analytics and machine learning workloads that power their work every day. At the core of these emerging workloads are increased adoption of in-memory databases and clustered databases that benefit more immediately from low latency storage.

Enterprise Analytics

In the enterprise analytics stacks, we see consolidation not just happening on the storage layer, but also in the design architecture of those analytics platforms. For example, Splunk’s latest SmartStore stack revamped its architecture from a three-tiered approach to two-tier architecture with one tier optimized for performance (flash) and one tier optimized for cost (cloud).

We see consolidation not just in the storage layer, but also in the design architecture of analytics platforms.

This provides the best of both worlds – one world where smart caching techniques optimize data placement closer to the “indexers” and  cloud storage, such as ActiveScale™, where data is being analyzed and  stored for long-term data retention. In analytics, more data over time, typically, results in better insights.  (Here you can read more on Splunk® SmartStore and how Splunk is storing data from Flash to Cloud in their new architecture.)  Splunk showed in their latest report “The State of Dark Data report” that 76% of the respondents surveyed agree that “the organization that has the most data is going to win”.2  Finding the most efficient way to store your data and keep it accessible at scale is the key to success.

Big Data Workloads

In big data workloads where batch analytics is primarily run through an Apache Hadoop® stack, we see adoption of optimized S3 protocols (S3A) that place data more efficiently between a fast cache and a scale-out cloud tier. Here’s an in depth look at how this is achieved in Hadoop®.

Machine Learning

In the emerging data pipelines of machine learning applications, where hardware accelerators, such as GPUs or FPGAs, are prevalent, a combination of low latency and high throughput is required. These hungry GPUs easily consume 5GB of data per second. To get the ROI out of your expensive investment you want to make sure you use those accelerators at full capacity.

To get the ROI out of your machine learning investment, you need to use accelerators at full capacity.

The challenge is that in today’s environment very few solutions can deliver these performance numbers at scale in single system. Considering the data sets for training those ML models easily run into the 100s of TBs and even PBs of capacity, it’s not just a performance challenge, but also a matter of cost. It’s not surprising we see an ecosystem of scale-out file and block stacks (such as our partners at Weka.IO and Excelero) that combine an extremely fast NVMe storage tier with a cost-efficient scale-out object tier behind it.

Round Table – Storage Optimization

If you want to learn more about how NVMe and cloud solutions can help you consolidate your workloads, join me at the upcoming MegaCast by ActualTech Media on Storage, Flash, NVMe, and Storage Optimization.

For this webinar, multiple vendors will come together to discuss Flash, NVMe and Storage Optimization, and we will dive deeper into how workloads are changing the data center and what to consider when looking at both NVMe and the cloud solutions. If you have any questions you would like me to address during this webinar, please feel free to comment below.

Save your seat! Storage, Flash, NVMe, and Storage Optimization.

 

[1] https://searchstorage.techtarget.com/opinion/Cloud-repatriation-and-the-trend-away-from-all-things-cloud

[2] https://www.splunk.com/en_us/newsroom/press-releases/2019/dark-data-research-reveals-widespread-complacency-in-driving-business-results-and-career-growth.html

Stefaan is sr. director of solutions marketing and strategic alliances at WDC, with 15+ yrs in the data storage and backup industry.