Object storage has most often been selected for unstructured data due to cost considerations for unstructured data, such as video, audio, sensor data, emails, IoT etc. is growing much faster than your transaction data. Another reason object storage is used is greater data durability than RAID-based NAS or block storage, for long term needs. In most cases, performance has not been a consideration.
However, the role of data has evolved. Historic data can serve as a basis for great insight. The value of data often grows over time. As such, unstructured data needs to be accessed and transformed. This requires a new thinking about the data infrastructure that stores it, and as it is no longer just cold idle data, performance becomes a relevant metric.
What is an Appropriate Performance Metric for Unstructured Data?
Yet, what is an appropriate performance metric for unstructured data? Should your object storage strive for the same performance goals as your NAS or SAN?
Performance testing takes into consideration a number of variables in order to be useful. Things such as the size of the dataset, the number of threads in the system, data conversion requirements, and physical connections are critical.
SAN and NAS storage often deal with transaction data and database applications where you are working with a subset of data that you need instantly. In the transactional and database world, the preferred performance measurement is latency – the time delay until the data arrives to the application. High performance NAS or SAN delivers small payloads with millisecond or less latency, which is very useful for transaction or database processing. (For example, our IntelliFlash™ N-Series NVMe™ storage arrays are made for near-real time with only 100s of microseconds latency)
However, this metric is not relevant for unstructured data processing. Unstructured data stored on object storage is usually massive, not a small subset of data or a database. Object storage needs to deliver large payloads with gigabytes/second of bandwidth, for efficient utilization of both massive and less active data. The object payload is often 1,000 times bigger(!) than the SAN or NAS payloads, which reflects the different kinds of work being performed in the two environments.
Measuring Unstructured Data Performance on Object Storage
Object storage is measured on its ability to move massive amounts of data to the applications, often with multiple thread parallelism to boost application processing. That’s why performance must be measured differently for object than for NAS/SAN.
Now comes the next challenge. Our industry has developed many standardized tools and benchmarks to test SAN and NAS workload performance. For unstructured data on object storage, COSbench is a commonly used workload utility to measure on-premises or public cloud object storage performance. However, it does not measure multiple thread jobs well, which is critical for understanding your unstructured data environment and performance levels.
For this reason, we had to build something new. Western Digital created an alternative workload application for our testing, that takes multiple threads into account. This application is a workload generator designed to simulate multiple parallel users accessing cloud infrastructures. This workload application gives a more realistic environment to measure unstructured data performance.
In addition, we worked with the Enterprise Strategy Group (ESG) to have an objective assessment of the performance of our ActiveScale™ solutions. ESG tested both single and multi-site implementations, different object sizes and emulated multiple concurrent users, typical of IaaS environments.
Now that you have a workload and a testing application, it’s time to take some measurements. But first, let’s test your knowledge / assumptions:
Pop Quiz! What’s the maximum throughput of object storage systems like Western Digital’s ActiveScale? Get the answer here: https://t.co/Brx288IA7I
— Western Digital (@westerndigital) January 23, 2019
My guess is that you would expect that object storage should generate 2-3GB/sec bandwidth. When we and ESG test Western Digital ActiveScale systems we find a 3-GEO P100 can deliver up to almost 25GB/sec of read-write throughput and a single GEO over 8 GB/sec. That’s high performance for unstructured data workloads!
When Object Storage Moves Beyond Cold Storage, You Gain New Value from Unstructured Data
With this kind of performance you can unlock value from unstructured data in ways previously not possible. Until today object storage was limited to cold and archive data workloads, with ActiveScale’s levels of performance, a wide range of secondary storage applications can be considered. For example, NAS optimization, analytics, compliance, and IoT, to name a few. Indeed, you could even repatriate data from the public cloud to meet performance objectives, and even save some money along the way.
With a world awash in unstructured data, it is a good time to better understand how to get the most from this data and derive economic benefits from it. New performance options for accelerating unstructured data can unleash new potential use cases and improve monetization of this valuable resource.
See what high performance means for object storage! Download the ESG Technical Validation paper.