Splunk SmartStore is a radical departure from the classic architecture, designed to make Splunk more cloud-friendly. Here’s what you need to know, and how to take best advantage of it.
Machine generated data is everywhere. More than 44 exabytes of data are generated every day, flooding out of billions of laptops, smart phones, and internet connected devices. For those who know how to take advantage of it, this information can be extremely useful. With the right tools, it can give us valuable insights on what our users are doing and how our systems are functioning.
Big Data and Machine Data
Perhaps the best-known tool for analyzing machine data is the Splunk platform. It allows users to quickly make sense of large amounts of machine data from any source, regardless of the data type. It can reveal long term trends and patterns of activity that you’d never know about otherwise. It can even do predictive analytics, using AI and machine learning to forecast events before they happen.
Unfortunately, the volume of data that we’re being asked to store nowadays is often more than our Splunk environments can handle. Large companies can easily generate terabytes of machine data every day, and at this rate, it can be difficult to maintain a year or even 6 months’ worth of data before running out of space.
Thankfully, Splunk has come up with a solution. It’s called SmartStore, and it’s the hot new feature, designed to make Splunk more cloud-friendly. It’s a complete overhaul of their storage architecture and it allows you to scale storage independently from compute. With SmartStore, you can add petabytes of storage to your Splunk environment instantly, without adding a single indexer!
The Classic Architecture
The classic version of Splunk stores data on the indexers, using locally attached drives. The files are organized into a series of directories called buckets.
To prevent data loss, three copies of each bucket are created and spread out across multiple servers. This does a good job of protecting your data, but it consumes a lot of disk space. When you’re working at petabyte scale, making three copies of your dataset can get expensive really fast.
Scaling while using the classic architecture is sometimes challenging. If you want to add more storage, you have to deploy more indexers to go with it.
Splunk with SmartStore
With the new Splunk SmartStore architecture, they’ve decoupled the storage from the indexers so that you can grow them independently. Indexes and raw data are now stored externally, on an S3-compatible object store. The locally attached drives, which previously held the hot, warm, and cold buckets, have been re-purposed as cache to accelerate searches. This is a radical departure from the classic architecture and offers several compelling benefits.
Up to Double the Usable Disk Capacity
Splunk SmartStore gives you a huge boost in storage density, thanks to the addition of object storage. Object stores use a technique called erasure coding to protect your data. If you’ve never heard of erasure coding, you can think of it as the next generation of RAID.
In an erasure coded volume, files are divided into shards, with each shard being placed on a different disk. To protect you from disk failures, additional shards are added, which contain error correction information. Only a subset of the shards are needed to retrieve a file, which means it can survive multiple disk failures without the risk of data loss. (Solutions such as Western Digital’s ActiveScaleTM object storage system deliver up to 19 nines of data reliability!)
Best of all, erasure coded volumes only require 35% of additional disk space to protect your data, as opposed to triple-replication, which requires 200% more space. This means you get more than DOUBLE the usable disk capacity, compared to triple replication! Erasure coding dramatically reduces your cost per TB as well.
Better Performance with Smart Caching
Another advantage is improved performance. Search performance is largely dependent on the speed of the underlying storage devices. With the classic architecture, storage tiers were configured statically, in hot, warm, and cold buckets. If you wanted to analyze older, “cold” data, your search would run against the cold buckets on your slowest storage tier.
Splunk SmartStore gives you a significant boost in search performance due to the way the cache is designed. It’s a smart cache, which moves data dynamically, based on your search criteria. If you’re searching within a particular time period, it can read in all of the data from that time period and cache it, ensuring that searches are always run from the fastest possible storage tier.
Take Advantage of More Data, for Longer Periods
Perhaps the most important advantage is improved data retention. The added storage capacity and elastic scaling makes it much easier to meet your data retention requirements. And since your data now spans a much longer time period, you can study long term trends, uncovering patterns that you’d never know about otherwise. You can anticipate changes, prepare for them, and adapt to them preemptively.
Splunk SmartStore is an exciting feature, for sure. And with all of these great benefits, the only question left is which object store to choose?
Splunk SmartStore and the ActiveScale Object Storage System
Splunk SmartStore is a great fit for Western Digital’s peta-scale ActiveScale object storage system. ActiveScale is a fully-featured, S3-compatible object store, designed from the ground up to handle today’s big data storage challenges.
We are in the unique position of being the only vendor that manufactures nearly all of the components throughout the entire storage technology stack. This enables us to offer some of the most aggressive pricing in the industry and lead the charge in helping businesses transform themselves.
Mike McWhorter is a Senior Technologist for Western Digital. He specializes in performance tuning and storage optimization for Western Digital's big data customers. He is involved in testing and benchmarking new applications as well as optimizing them for various types of storage technologies. Previously, Mike worked as a Solutions Architect for the federal sales team, where he was responsible for designing and implementing large-scale distributed systems for the federal government. Mike received a Bachelor’s degree in Computer Science from Longwood University in Virginia.