What is unstructured data and what are examples of unstructured data? Unstructured data is data that doesn’t fit in a spreadsheet with rows and columns. It isn’t in a database. It’s not in an ERP or CRM type data where you know what kind of data is in each cell, and how it relates to the rest of the data. Unstructured data is somewhat renegade – things that exist just because they exist.
Examples of unstructured data includes things like video, audio or image files, as well as log files, sensor or social media posts. Even email has some unstructured aspect to it – basically all the text that follows a well-defined timestamp, from: and to: fields. Now add to that all the machine to machine and sensor data flowing out of the Internet of Things, and you start to understand the magnitude of the challenge.
According to Gartner’s 2018 Magic Quadrant for Distributed File and Object storage, unstructured data is growing at 50% year to year, so it’s worthwhile spending some time to understand it.
Unstructured Data is Changing
Unstructured data is changing. What used to be mostly user home directory data is now large media files, massive databases and data lakes, and architectural information as well as billions of small files from IoT devices and business systems outputting information into log files.
Organizations want to store all types of information for longer and longer periods so they can analyze data more deeply to drive better product creation, provide better customer experiences and increase efficiency. For this reason, unstructured data is even data extracted from databases and output in a flat format so that other processes can scan the information normally held in a proprietary format.
What used to be mostly user home directory data is now large media files, massive databases, data lakes, and architectural information as well as billions of small files from IoT devices and business systems log files.
Access to the unstructured data set is also changing as there are greater datasets and organizations need to retain the data for a much longer period of time. Generally, a portion of data needs to be stored so it is able to provide rapid access for analytics processing. Another portion, a much larger percentage, of this data may not be accessed for months at a time, but it needs to be stored on reasonably responsive storage so that historic data can easily be included in searches, analytics, monetization purposes or other processing driving business value.
The Role of Object Storage
Traditional network attached storage systems can’t meet the scaling requirements of such large repositories of data, especially in-terms of metadata management – tagging data so that it can easily be contextualized and utilized for multiple future references. It is also too expensive and not designed for the long-term archive use case. Many organizations from various industries are leveraging object storage architectures to overcome the limitations in traditional file-based technology.
Examples of Unstructured Data – Use Cases on Object Storage
Email and Archive Use Case
The pressure to retain data and later find that data quickly, is increasing. Privacy regulations may also require organizations to not only to protect and retain data but also to remove personal data at the user’s request. This requires a cost-efficient solution that can better guarantee data validity and is searchable (using metadata) so data can be found when the organization needs to process it or respond to a request.
The IoT, Log and Sensor Data Use Case
Systems, devices and machines used in manufacturing, data center and the broader enterprise, continuously output information about their operation. This data is written to log files. Applications like Splunk® are now, with their SmartStore technology, building in native capabilities to move older log and sensor data to object storage and to recall that data when needed. As a result, organizations are able to keep years of this data to make their analysis of organizational operations even more accurate and effective.
The Data Lake Use Case
As organizations continue to improve their ability to leverage existing data assets to create new products, services and to make better decisions, they need to have a central storage area for it. Organizations are finding out that the more often they can cross-correlate data the deeper the analysis process can become. However, the cross-correlation of data requires that IT provide a supporting storage infrastructure that can store data from various sources – structured and unstructured data – in a single repository that can easily scale. Object storage is an ideal choice.
The Media Use Case
The Media and Entertainment industry, as well as traditional enterprises, are creating massive amounts of video and audio content. Research shows that global IP video traffic will be 82 percent of all IP traffic (both business and consumer) by 2022. This content often needs to be used again after it is originally created. It may need to be reprocessed at lower resolutions or used again in creation of additional content. Object Storage cost efficiency at petabyte scale, alongside features like erasure coding, metadata capabilities and reliability make it a perfect companion for unstructured media files. You can dig deeper here.
Data is changing. Unstructured data is growing at a faster pace than structured data and its role is critical in delivering business insight and value. Data is now at the center of businesses. You may find that the bulk of your data lacks the metadata that enables a more flexible use of this data, or your data may sit on traditional systems that can suffer from scaling limitations. Organizations are rethinking architectures as we quickly approach the Zettabyte Age. One of the true values of object storage is its ability to support a large variety of use cases and almost every organization will benefit by leveraging the technology.
Unstructured Data Resources
- The Key to Unstructured Data Performance
- Addressing the Changing Role of Unstructured Data
- 4 Tips to Unearthing the Payload of Unstructured Data
Erik is the Senior Director, Product Marketing of Western Digital's Data Center Systems, with 25+ years of experience in high tech storage.