I’m a tidy person. I like things in their correct place. I don’t think I’m obsessive, but things work better when everything is in its place. Yet unstructured data doesn’t fit into a well-defined world. It is unstructured, after all.
What is Unstructured Data?
When I say unstructured, I’m talking about data that doesn’t fit in a spreadsheet with rows and columns. It isn’t in a database. It’s not in an ERP or CRM type data where you know what kind of data is in each cell, and how it relates to the rest of the data. No, I’m talking about those renegade data types. Things that exist just because they exist. Unstructured data includes things like video, audio or image files. Things like log files. Things like social media. Even email has some unstructured aspect to it, like the rambling text that follows a well-defined timestamp, from: and to: fields. These unstructured things disturb my sense of order.
Why Are Companies Storing Unstructured Data?
I might want to analyze tweets for some perspective on what customers are thinking. Perhaps they want to look at log data to find security exposures, or edit the video file before distributing it.
From a storage perspective, the question is where to put all this unstructured data? I spend a lot of money for each byte of structured data I store because I want to access it immediately to fulfill orders, generate paychecks, check inventory or ship products to customers. It makes good business sense to store that data in a high-performance flash array with traditional RAID to protect that data, even if it costs more money. But unstructured data is used differently. Its performance requirement is different. The desired economics are different. So why is unstructured data overrunning traditional storage and draining budgets? Because there’s a lot of it.
Let’s think about this. If you’re like me, you have a lot of unstructured data but you don’t want to spend a lot of money. I don’t need a lot of performance for it, so why would I put unstructured data on my Tier 1 storage SAN storage? It seems like a poor fit.
Maybe you’ve even been dumping unstructured data on your NAS system. NAS is expensive storage. It has scaling challenges for large volumes of unstructured data. It may work for a while, but it will let you down as you continue growing.
I suggest you consider an alternative: object storage. Unlike block-based (SAN) and file-based (NAS) storage systems, object storage systems are architected in a way that’s complementary to unstructured data. They store objects in a pool rather than in a hierarchal system. As such, object storage can also scale like crazy. Even better it doesn’t cost a lot of money. It provides Tier 2 type performance. A far better fit for unstructured data.
If you have been storing your data on a NAS, how about taking that data off your NAS and putting it in a cold storage tier where you can do all the analysis, but not get clobbered with Tier 1 pricing? That could work out nicely, and it frees up capacity on your expensive Tier 1 NAS storage so you can put more active data there without buying more systems – a win/win.
Object Storage Options
Object-based storage is not a physical thing. Rather, it is a logical organization of physical things. In other words, it’s not a media but rather how media is used. If you’re not familiar with object storage, I suggest you read this introductory blog post on object storage where Clay Ryder walks through object storage basics and why it matters for your data and your business.
If you’d like to explore object storage options a little more, consider the ActiveScale™ family of object storage systems. You can scale-up or scale-out (from a few hundred terabytes to over 52 petabytes) to expand capacity to accommodate your mountain of unstructured data. It provides solid Tier 2 performance with low OPEX and CAPEX to deliver the right answer to the unstructured data question. ActiveScale has advanced data durability and system availability features to protect your valuable data.
This week, the ActiveScale family has been expanded with the addition of the ActiveScale X100 System, which provides cloud-scale storage based on our next-generation operating system. Go to www.hgst.com/products/systems It will restore your sense of order in the universe.
Want to learn more about storage options for unstructured data?
Stream our webinar: Control Renegade Data! Storage Options for Unstructured Data
Erik is the Senior Director, Product Marketing of Western Digital's Data Center Systems, with 25+ years of experience in high tech storage.