Object storage is a relatively new way to store data, but why should you care? What problems does it solve and why do you need object storage? Let’s consider five common problems that object storage was designed to address.
Object Storage is Built for Scale
There is probably no more compelling reason for object storage than keeping up with the mountain of data you need to store. The nature of data has changed in the 21st century. Transaction data was neatly stored in structured block storage devices, but now you deal with a wild and fast growing mix of data from social media, video, audio, log files, sensor data, emails and more.
Putting all this on traditional block or file storage is expensive, and hard to manage. While some data is still appropriate for traditional ways of storing data, most are not. A significant reason traditional storage can’t keep up with this kind of growth is the directory structures. Directory structures allow data to be organized into a framework that makes it easy to find and retrieve. Yet as you grow, the directory gets larger and larger and becomes more and more cumbersome to navigate which can impact performance, which is one of the design points for traditional storage. For the small percentage of unstructured data that is active, it is still a great choice. However, when dealing with less active data, massive directories can become a real problem.
Object storage is different. It has a flat file space that uses metadata to describe the object and unique identifiers that locate the object. This allows you to scale almost without limit without negotiating an increasingly large and unwieldy directory structure. Object storage was designed to scale and addresses the needs of the 95% of less active data in the enterprise and cloud. Object storage can keep up with the massive growth rates often found with unstructured data types you want to store.
Object Storage is Less Costly
Because object storage is designed for the mountain of data that can tolerate lesser performance than transaction data, you can build a less expensive architecture. To get extreme low latency you commonly employ caching and tiers to get the most performance for the least expense. If your overarching consideration is scale, you may not need all that performance. You don’t need the expensive flash tier and continual tuning of cache to get the best performance. Indeed, you can start taking IT out of the equation by letting them set up a private cloud of object storage and make it self-service to reduce the overhead cost. At that point you may decide that accessing data off tape is too expensive because of the human intervention expense, and start bringing that data online with object storage to reduce costs for data that is being restored from tape.
Object Storage Increases Data integrity At Scale
You store data with the expectation that when you want to retrieve it, you can. And you’ll want all of it, not part of it. This becomes a bigger challenge as disk drives grow and RAID rebuild times grow with it, raising the possibility, indeed the probability, of data loss. Object storage addresses this increasing threat with a different approach known as erasure coding. Despite the horrible name, it protects data in a different way. RAID protects data by rebuilding a disk drive’s information. Erasure coding protects data by rebuilding chunks of data, not a physical device. The difference is data integrity of a new order. Additionally, most (but not all) object storage systems have strong consistency so you don’t have to worry about getting stale data, another good thing regarding data integrity. And finally, many (but not all) provide some sort of background data integrity checks and self-healing of corruption to make sure your data has not been corrupted.
Object Storage Offers Different, Simplified Manageability
Managing terabytes of data is tough, but managing petabytes of data requires a different approach. Traditional storage often provides rack-based management to add users, identify failed HDDs, add encryption, provision new storage and all the other day-to-day tasks of the data world. This worked well with gigabytes, it improved so it could work adequately with terabytes, but doesn’t work very well for petabytes. It is simply a matter of scale. As the amount of data continues to grow a new approach is required to effectively manage all this data with the available resources.
Object storage manages a namespace, not a rack of storage. The difference is significant. In object storage, a namespace could be a rack of storage, or multiple racks. It could be local, or it could be geographically dispersed. However configured, it can all be managed in a single pane of glass. This is real productivity and provides a broader view of your storage resources.
Object Storage is the Language of the Cloud
Applications of the 20th century were accessed through a GUI or even command line interfaces. 21st century applications are accessed via a browser. The browser is your portal into the cloud. With object storage, it is also the portal into a private cloud. Your applications probably speak to the cloud with Amazon’s S3 protocol. The industry has seen the momentum behind Amazon S3™ and most object storage vendors (maybe all significant vendors) now use S3 as the dominant access method for object storage. The advantages include easy access, high security (think https) and low overhead. The cloud is very 21st century and object storage is the infrastructure supporting the cloud.
Data is moving to object storage in a way that may not be obvious, but the reasons for the migration are obvious. Make sure you don’t get left behind.
Do You Need Object Storage?
Watch my recent Tech Talk to see how to take advantage of object storage and easily scale from terabytes to petabytes with the ActiveScaleTM solution.
Erik is the Senior Director, Product Marketing of Western Digital's Data Center Systems, with 25+ years of experience in high tech storage.