This is my second post in the “Speeds, Feeds and Needs” blog series, designed to explain the more technical elements of enterprise storage in terms that are understandable to everyone, whether you are an IT manager, end-user or chief information officer. My first post discussed how latency affects storage architectures and how reducing this can dramatically improve system performance. In this post I’ll discuss SSD endurance, and how this affects the life expectancy of your SSD and ensuing costs.
In all likelihood, if you are reading this blog you have looked at an SSD data sheet and perused the numbers. In many cases you noticed capacity and throughput, and then calculated the cost per GB. After all, cost is all that really matters, right? Well, that’s actually wrong. Arriving from the hard disk drive era, these were practical rules of thumb and simple ways to compare the costs of hard drives. The other attribute of hard drives that needs to be taken into account is their lifespan. Hard drives are mechanical devices that do wear out, and you have likely experienced a hard drive failure. These failures occur unexpectedly and catastrophically, in the data center as they do for personal use. According to this SNIA article, hard drives see a Mean Time Between Failure (MTBF) of 1 million hours, while SSDs show 2.1 million hours. SanDisk® further rates its Lightning SAS SSDs at 2.5 million hours MTBF. Thus, SSDs have more than twice the longevity than HDDs.
However, SSDs are different in how they wear in comparison to hard drives. Due to the characteristics of NAND flash, SSDs have a finite lifetime dictated by the number of write operations known as program/erase (P/E) cycles NAND flash can endure. The objective of SSD endurance numbers is to capture this consumable nature of flash storage into a quantifiable number to provide end users guidance on the anticipated lifespan for the drive in operation. SSDs come in a variety of endurance points matched against their intended work pattern. Obviously, SSDs intended for a single user such as a consumer will differ greatly from data center-grade SSDs that are rated to withstand the demands of thousands or millions of users. It’s important to procure the right SSD for your workload and budget needs.
There’s Even More To Consider
The huge variation in flash endurance between different types of SSDs and the challenge of measuring SSD performance in a repeatable way makes the picture more complex.
Single-Level Cell (SLC) NAND Flash, which uses a single cell to store one bit of data, provides high endurance to meet the needs of the most write-intensive applications. However, this endurance comes at a higher price – in many cases prohibitively expensive. On the other hand, Multiple Level Cell (MLC)-based SSDs that use multiple bits per cell to store more bits cost less, but they also have far lower endurance. Without any special treatment, MLC SSDs are not able to bear high number of writes needed for data center workloads. Furthermore, the new generations of NAND flash in the sub 20 nanometer geometry (referred to as 1Y and 1Z by SanDisk) shrink flash sizes to enable increased density with lower costs. However, these new geometries have a negative impact on SSD endurance. Shrinking geometries reduce the size of the transistors/gates in the silicon and smaller size results in fewer program/erase cycles NAND can endure.
The intrinsic NAND flash need to erase in ‘blocks’ before writing to a ‘page’ results in write amplification, where the data size written to the physical NAND is in fact several times larger than the size of the data that is intended to be written by the host system. This write amplification is correlated to the nature of application workloads and has direct impact on SSD endurance.
Techniques such as wear leveling and over-provisioning are common in improving SSD endurance. Wear leveling ensures even wear of memory blocks across the flash device by evenly distributing all write operations, thus resulting in increased endurance. Over-provisioning sets aside extra physical flash capacity for background operations, resulting in better write performance and higher endurance.
Due to this ongoing trade-off between cost and endurance, SSD manufacturers have sought ways to improve the endurance of cost-effective MLC Flash in the hopes of making MLC-based SSDs better suited for enterprise workloads.
Not All Flash Is Created Equal
The matter of fact is, that not all flash is created equal, even when you are comparing MLC to MLC! In reality, each flash chip can vary greatly in its native endurance capabilities. You cannot compile a bunch of flash chips, put them on a printed circuit board and begin the countdown to the theoretical end of life. The problem with this approach is that often times it leaves a lot of life on the table, and results in a standard endurance on the data sheet.
At SanDisk, we have been focused on solving the challenge of endurance since day one, resulting in our innovative Guardian Technology Platform. This technology suite treats flash as a system, rather than individual blocks of media. So why do we do this? Well, being able to evaluate each individual flash chip to see what its maximum endurance is, (while dynamically monitoring how the flash is behaving over its lifespan), allows the system to choose which NAND chips should get more writes than others. This enables us to tune our SSDs to meet specific endurance levels required for a variety of enterprise applications, all while using the same MLC flash. Customers like you are then able to purchase SSDs with the endurance levels you require for your application. This allows you to avoid over-paying for write levels you don’t need and can ensure you won’t rip-and-replace low-endurance SSDs as they burn out.
The Endurance Equation
SSD endurance is commonly described in terms of full Drive Writes Per Day (DWPD) for a certain warranty period (typically 3 or 5 years). In other words, if a 100GB SSD is specified for 1 DWPD, it can withstand 100GB of data written to it every day for the warranty period. Alternatively, if a 100GB SSD is specified for 10 DWPD, it can withstand 1TB of data written to it every day for the warranty period. Another metric that is used for SSD write endurance is Terabytes Written (TBW), which is used to describe how much data can be written to the SSD over the life of the drive. Again, the higher the TBW value, better the endurance of the SSD. What’s important to be careful about in this endurance specifications is the methodology used to determine these values. For example, the shorter the warranty period, the higher the DWPD will be for that specific SSD. Or larger the SSD drive capacity, higher the TBW for that specific SSD. In fact the JEDEC Solid State Technology Association’s standards subcommittee has now published two standards to help both SSD vendors and customers determine SSD endurance specifications for typical application usage.
Obviously, the higher the endurance the longer your drive will be able to operate and the more data it will be able to have written to it. Sounds simple enough, doesn’t it? The trick isn’t in searching out the biggest number, but rather in understanding what you need and finding the solution that meets those needs without over-buying.
The Cost Implications of SSD Endurance
Understanding the cost implications of SSD endurance is critical in choosing the device with the best ROI. In my next blog, I take a deeper look at the cost implications of endurance, how to calculate the most suitable, and economic solution and most importantly – how to avoid a costly investment. Read it here: The TCO Implications of Endurance.
Hemant has extensive experience in product management and marketing, software development and performance engineering.At Western Digital, he is instrumental in developing best practices and reference architectures for deploying SSDs in multi-tier enterprise applications, including Tier-1 business critical applications, virtualization, and big data technologies such as Hadoop and NoSQL databases.Previous to Western Digital and his role at SanDisk, he has worked at leading high-technology companies such as VMware, EMC, Informix and Commerce One, and has presented at several industry conferences, such as VMworld, EMCWorld, InterOp etc. Hemant is co-author of the book "Virtualizing Microsoft Tier 1 Applications with VMware vSphere 4" and numerous other technical collaterals.He received a Bachelor of Science in Electrical and Engineering from B.I.T.S., Pilani, India, and an M.B.A. from Santa Clara University.