There are many misconceptions and outright myths about flash storage. Everyone has held a flash-enabled device in their hand at some point, so the category of flash has many connotations to our world of wearables, smartphones, and tablets, but few of those have much to do with enterprise-class flash technology.
To stick to the most common item people think about when they hear flash storage, is the USB flash key design (a SanDisk® invention, in fact). No company would trust their most valuable data on a consumer-grade device. But enterprise flash devices are a completely different beast from consumer-grade flash. Each is designed to meet particular needs and use cases in regard to capacity, performance, reliability, and cost.
Let me explain a few of the design decisions and implementations that set SanDisk enterprise flash storage devices apart and why SSDs in the data center are nothing like a USB key!
1. End-to-End Data Protection
A device being designed for enterprise applications must protect the customer data with all means the state of the art allow.
End-to-end data protection means that as the data flows to and thru the device, it is protected from loss or undetected change due to numerous “bit flipping” occurrences. This is a phenomena that all electronic devices can encounter. Everything from alpha particles to electrical atmospheric noise to background radiation to device signal quality can cause individual bits to be flipped to an invalid state.
As the data passes thru the flash drive, it will be passed from chip to chip, which exposes the electrical signal as it travels outside of the silicon chips. An enterprise Flash device from SanDisk will perform what’s referred to as an Error Correcting Code (ECC) or Low Density Parity Check (LDPC) check on the data. This is accomplished by passing extra parity data along with the original data packet, which can be used to assure the data that exited one chip hasn’t been subjected to a flipped bit in the packet that would, of course, result in incorrect data being received by the next chip in the data path or written to the drive or sent to the requesting application.
This is done not only on the path to the media but also on the path to and from DRAM/SRAM used on the device thus the term “end-to-end” data protection.
2. Power Fail Protection
Power Fail Protection is also sometimes referred to as “pfail”. The purpose of pfail is to protect the user in the event an unexpected power loss occurs while data is in transit. If a write to final media is underway, when the write is imminent the device will send what is known as a “write commit” back to the host.
In some cases SRAM or DRAM will be used as a front-end buffer to provide additional write performance for the drive. With no pfail on a device, the data could be committed to the SRAM or DRAM or in transit to final media and encounter a power fail that prevents the written data from ever reaching that final media. This of course either loses the data or causes an old or “stale” copy of the data to be in place as opposed what was expected. No one wants to make a million dollar deposit only to have a power glitch cause the deposit data to fail to reach final media.
SanDisk enterprise devices will either use capacitance to ensure data being committed to media have enough power to complete a committed write, or will not issue a write commit until the data is in a position to be guaranteed to reach final storage media in the device, even in the event of a power failure, before committing it to the host.
3. Temperature Throttling to Ensure the Drive Won’t Overheat Itself.
Storage devices produce heat when reading and writing as does any device that’s doing actual work. Since flash can be used in standard drive bays in the front of the server, in PCIe bays, and in other internal slots such as M.2 form factor, etc., it can be subjected to environmental temperature fluctuations in the data center or other server location, not only from its own produced heat, but also heat exhausted from devices upstream of it. In an effort to ensure the maximum uptime of a server, we never want a device to fail due to overheating that’s a result of the drive’s heat signature.
SanDisk devices actually monitor themselves and throttle the device accordingly if the media is reaching a point it isn’t safe to protect the customer data, or the flash media is at risk of thermal damage.
Throttling the performance of the device in an intelligent way allows the heat produced by the device to be reduced which allows for the device to run cooler thus saving the data even when part of the problem is heat produced by some other device component such as CPU exhaust heat upstream.
Throttling will not save a device from external heat exhaust if that heat exhaust is so great that it is higher than the on-board NAND’s capabilities to withstand it. In other words, if you hold a blowtorch to the flash device no amount of internal throttling will stop it from melting!
4. Quality of Service for Performance
Quality of Service, or QoS, is a specification that SanDisk assigns to its enterprise devices to help ensure the customer can guarantee a certain level of performance. SanDisk specifies QoS as:
- Max read latency <50usec 99.99% of the time (QD1)
- Max write latency <100usec 99.99% of the time (QD1)
Client devices do not offer a specification for QoS. The 99.99% specification here is enterprise quality and is actually better than a large number of enterprise SSD competitors, which limit their QoS specification to 99.9%.
In either case, client drives typically provide no specification for this. Performance stability is also guaranteed on SanDisk SSDs to stay within +-5% on every workload of the specification targets for performance. Client devices typically give no such guarantee so you get what you get.
5. Uncorrectable Bit Error Rate
Uncorrectable Bit error Rates are usually expressed as UBER. UBER is essentially the odds of a piece of needed data not being accessible from the media.
Client-based devices tend to fall at around 10-15, while enterprise HDDs are typically better at 10-16. Enterprise SSDs tend to be around 10-17. This means an enterprise SSD is 100x less likely to lose a piece of data than a client device, and 10x less likely to lose a piece of data than an enterprise HDD is.
SanDisk actually has enterprise devices that have an UBER of 1018, (CloudSpeed-based drives) which are 100x less likely to lose data than even an enterprise HDD, and Fusion ioMemory™ PCIe application accelerators with an UBER as high as 1020. A Fusion ioMemory PCIe flash device has an UBER that is 100,000x less likely to have an uncorrectable bit failure than a client SSD, 10,000x less likely to have an uncorrectable bit error than an enterprise HDD! And our CloudSpeed™ Gen. II SSDs lead the enterprise field with a better UBER than any competitor in the industry.*
So for anyone who still thinks flash is something meant for just consumer level USB devices, I hope this article helped you to gain a better understanding to what sets consumer and enterprise-grade flash apart and what you need to look for when choosing a storage device to accelerate your data center storage. In my next blogs I’ll look at the benefits of flash beyond performance and help bust a few more myths on flash.
Want to Learn More? Join me for a Webinar!
What You Need to Know About Enterprise Flash May 19th at 9:00 am PT, or after on demand
When considering flash storage, there are many misconceptions and outright myths. Especially when equating consumer-grade flash (USB sticks) to enterprise-grade SSDs. In this webinar SanDisk Chief Architect, Adam Roberts, will discuss 5 myths of flash storage and highlight what you need to look out for when choosing a storage device to accelerate your data center storage.
I welcome your questions and comments below.
* Based on internal testing
Adam is an Engineering Fellow with Western Digital and was formerly the Chief Solutions Architect for SanDisk (acquired by Western Digital).