It’s true that flash storage is getting faster and more affordable, but hard drives (HDDs) continue to play a major role in the enterprise. Nowhere is this more apparent than in today’s data centers where the amount of data created continues to grow exponentially, fueled further by the AI Data Cycle.
While consumers tend to only use solid state drives (SSDs) and flash storage in their notebooks, tablets, and smartphones, hard drives are the massive storage devices busy at work behind the scenes. They offer a lower cost-per-terabyte (TB) and are highly efficient for large volumes of data.
“Most of the data we interact with daily is on a hard drive in a data center somewhere,” Brad Warbiany, director of HDD technical marketing at Western Digital, said. “Whether it’s a photo, a video, or a social media post, we don’t see the HDD but that’s where most of the data is.”
According to the IDC Global StorageSphere 2024 report, HDDs will continue to make up almost 80% of storage used in hyperscale and cloud data centers through 2028.
In such hyperscale data centers, frequently accessed or performance-critical “hot” data is stored on enterprise-class SSDs for fast retrieval while less frequently accessed “warm” or “cold” data resides on more cost-efficient enterprise-class HDD media. This tiered approach helps optimize the cost of storing and accessing data and can be fine-tuned based on the capacity, performance, availability, and recovery needs of the system or application data.
“It’s like delivering goods. For long-haul bulk transportation, you would use a powerful 18-wheeler, but for last-mile urban delivery, you want a compact electric vehicle,” Darragh O’Toole, product marketing manager for SSDs at Western Digital, said. “Both will get the goods there but with differing speeds, capacities, and costs-per-good based on the workload requirements.”
TCO at scale
Economics is a key factor at scale. While the cost of flash storage per terabyte has improved significantly, enterprise SSDs still cost up to eight times as much per terabyte as enterprise HDDs and will remain at levels above five times for the next five years, according to analyst firm IDC in its Pivot Table: IDC Global StorageSphere Forecast, 2024–2028.
The largest data center customers often calculate TCO by terabyte capacity per watt. One way to optimize capacity and power efficiency in a data center is to squeeze more data onto each individual hard drive through technologies like OptiNAND™ and UltraSMR.
These technologies, alongside intelligent data architectures, are helping data centers to be more climate-conscious by optimizing power demand. Cloud hyperscalers have been transitioning to shingled magnetic recording (SMR) technology. SMR currently accounts for 50% of Western Digital’s shipped data center exabytes.
Use cases for HDDs include backup, archiving, and cold storage and will continue to play a vital role in enterprise infrastructures for the foreseeable future.
“Outside of finance or infrastructure, customers can’t afford all flash,” Warbiany said. “It’s about having a balance and combination of both. SSDs shine where fast access is required, such as millisecond stock trading or airline bookings. But when it comes time to scale up, many data center architects turn to HDDs for their capacity and scale.”
The AI Data Cycle feeds HDDs and SSDs
New data-driven applications and use cases are fueling explosive data growth. AI and machine learning (ML) are particularly data intensive, relying on massive datasets that need to be collected and preprocessed before they are fed into algorithms. These big data initiatives collect and process data in widely different ways—necessitating various data storage solutions to meet these diverse needs.
Western Digital CEO, David Goeckeler, was asked for his take on the subject in a recent analyst call, as documented in an article in Block & Files. “Clearly, HDD plays a big role in the AI storage life cycle as well as the whole ingest phase, because of all the big data lakes and all of the raw datasets; those are all going to be stored on HDD,” he said. “It’s just the economics of where you store that data, and how you access that data.”
AI models operate in a self-perpetuating, continuous loop of data consumption and generation— processing text, images, audio, and video among other data types while simultaneously producing new unique data.
Western Digital has detailed the AI Data Cycle across six different stages. As AI models evolve, they create even more data, which has specific storage requirements at each stage of the process.
HDDs play a key role as vast amounts of raw data are collected and stored from various sources. Data is then processed, cleaned, and transformed with fast SSDs to support AI training and inference. High-capacity enterprise SSDs cull data from fast data lakes. Trained models analyze new data and generate new content. Large language models (LLMs), for example, suck up huge volumes of data and then pull out what’s important for fast GPU processing models. Finally, new content is created from the insights produced by the AI models, requiring enterprise HDDs to store the new data for future models.
This continuous loop of data generation and consumption drives the need for scalable data storage solutions that can optimize AI implementations.
“Since GPUs are an expensive investment, companies want to keep them occupied all the time,” O’Toole said. “Compute-focused PCIe Gen 5.0 enterprise SSDs are critical for that speed and bandwidth.”
HDDs and SSDs essential to the enterprise
Storage is essential to the data economy. The combination of HDDs and SSDs not only powers what’s possible but enables companies to achieve more and right fit a solution for their needs. With the rise of AI, both technologies will become even more vital to unlocking opportunities. Understanding how to use them best will allow companies to be more successful with their data-powered endeavors.
“Data centers will continue to use HDDs where they can, but SSDs where they must,” Warbiany said. “They each have a place in terms of economics and speed.”
With data on the rise at unprecedented growth rates, both HDDs and SSDs will continue to provide the storage backbone for today’s enterprises.
As Goeckeler said in a recent analyst call, “It’s about growth and not substitution. It’s literally like a rising tide lifting all boats.” AI is just one example where the more data is generated, the more data will need to be stored.