Inline compression and deduplication seem to be features everyone boasts about carrying in their solution. But the truth is, they’re not all created equal and they can have both a positive and a negative effect on storage performance. However, when done right, compression and deduplication are not just about saving you money and capacity, they can also be a performance multiplier that enable consistently higher performance from your hardware.

My colleague, Harrison Waller, recently participated in an ActualTech Media EcoCast on Optimizing Your Virtual Environment where he talked about the innovation in how we architected the IntelliFlash™ array to allow users to achieve higher, consistent performance using compression and deduplication in a virtualized environment.

Here’s a summary, and you can see a demo and additional features by streaming the EcoCast.

The I/O and Data Path Across Memory Tiers

The IntelliFlash array is architected with a data tiering system across different memory and media types in the array: The performance tier is powered by a combination of DRAM and NVDIMMs. It is followed by a cache and metadata tier that leverages high performance SSDs and then the capacity tier, which provides persistent data storage using high-performance media and cost-effective high-density drives (by combining SSDs or an SSD and HDD hybrid solution).

Data In – As the I/O comes in to the array, it first lands in the performance tier where advanced algorithms do inline compression. The array then takes a checksum of that new compressed data block and writes it to NVDIMMs or write cache, acknowledges the host and destages the data to the capacity tier. The data then resides, compressed and deduplicated, in the capacity tier waiting to be read.

Data Out – Data can be cold, warm, or hot. The IntelliFlash array uses intelligent caching algorithms to keep the most frequently accessed data in the Read Cache, which resides on DRAM and performance flash. These algorithms are optimized for various I/O patterns and dynamically adapt to differing media latencies across multiple levels of cache. When data is in use, it moves back up to the performance tier where it gets decompressed to service the host. We also checksum again to ensure data integrity.

Get More Fast Data

So why is all this important? To increase performance in virtualization you want to:

  • Maximize the amount of data in high performance tier
  • Store only unique blocks in cache
  • Increase the cache hit ratio

With our intelligent acceleration of compression and deduplication, the performance tiers have only uniquely stored data. That means the number of writes and reads from the capacity tier are significantly reduced. And, since it’s been compressed and deduped, it frees up room to put more data in the fastest tier! This is where you can start to see significant performance increases.

The cache hit ratio is the data served from the high-performance tier. You want to maximize this – the more data that’s being serviced from this tier, the better the end user experience.

Metadata Acceleration

Another clever usage of the performance tier is to dedicate a portion of it to metadata.

A common issue in legacy solutions is metadata gets interspersed with the rest of the data, so that when data is modified, deleted, and rewritten, the metadata becomes very fragmented and impacts performance.

By dedicating a portion of the performance layer exclusively for metadata it stays organized and aggregated for optimizing I/O paths and accelerating all I/O operations in the array so you can take the most advantage from that high-performance media.

Demo: Compression, Deduplication and Performance Impact

In the EcoCast, Harrison showed a demo running a virtualized environment with 16 VMs doing simulated I/Os. In the demo you can see how much data reduction is achieved using our intelligent compression and deduplication. Most of our customers see about 5:1 data reduction, some even see higher data reduction depending on their virtualized environment. You can also see how the cache hit rate is being maximized at all times.

In the demo, he also walked through our tight integration with VAAI primitives that VMware EXSi™ provides, how to optimize your environment with new plugins and protocol statistics such as latency, IOPS and throughput across LUNS, shares and throughput. You can stream the demo here.

Learn more:

• Learn about the IntelliFlash Arrays

• Technology Brief: IntelliFlash Features

• Forrester Total Economic Impact Study: The ROI of IntelliFlash Storage


Seth has over 30 years of experience in the tech industry and is responsible for solution marketing for Western Digital Data Center Systems.