Data is not only growing, it is changing. Big Data is getting faster, Fast Data is getting bigger and a new data lifecycle is propagating new applications and technologies. So how will data change in 2018? We asked our bloggers and experts across the company about what’s ahead. Here’s what they had to say about 2018 data trends:
Janet George, Chief Data Scientist
The Big Data revolution has brought about a transformation of how data is captured, accessed, aggregated, transformed and preserved. We’ve moved from architectures that harbor data in silos of disparate applications, to architectures that unite, combine and analyze data in a pool so that we can learn something new. Some of this transformation has been successful. However, the industry also learned that most of our legacy data can only provide us limited insight as it query-constructed data. It was created to answer a question, and we constructed the data to match that query.
“The next data learning cycle is all about Fast Data”
As the industry’s understanding of Big Data matures, we are entering the next data learning cycle which is all about Fast Data – how to collect and analyze data near real-time (particularly unstructured data from IoT and IIoT). Fast Data is all about going to the source and collecting data in a manner that doesn’t impose a format and keeps as much of the raw data as possible. It will require new architectures that support and advocate for data at the edge, and we will see a fully automated data cycle with no human interaction.
As far as 2018 data trends go, the data creation growth rate remains exponential. Adding to the billions of smartphones, computers and other machines: millions of new sensors are fired up every hour, adding to the mighty stream of data that never slows down. Sensors bring in new types of data and they add context to existing data, making it meaningful. Most data is unstructured, such as video and audio, and thus suitable for object storage.
Digital transformation continues. Every physical system and device becomes connected and creates a cloud of data around it. Data becomes increasingly important; it moves from being something that businesses had in the background to being essential for everything we do.
“Every physical system and device becomes connected and creates a cloud of data around it”
The great news on data: it is increasingly valuable because there is more of it and because data processing, networking and storage become constantly better and more cost-efficient. The availability of plenty of data will save and earn money, cure diseases, and make the whole world work better.
The bad news on data: the masses of data are becoming so large that many enterprises are concerned about the effects of data gravity. Once petabytes aggregate in a given cloud platform, it will be all but impossible to transfer these elsewhere, should the need for that arise one day.
“The masses of data are becoming so large that many enterprises are concerned about the effects of data gravity”
The really bad news on data: data breaches, ransomware and fake news have created a general atmosphere of digital distrust that threatens to slow down and disrupt the beneficial uses of data. Overall in 2018, data security and ownership issues will be hot.
To protect the European Union citizens, the EU’s General Data Protection Regulation (GDPR) goes into effect May 2018. After that, a clear consent must be obtained from EU subjects to process their personal data, whether the company is located within the EU or not. It includes a set of “Data Subject Rights” such as breach notification, right to be forgotten, privacy by design and more. This set of rules will have far-reaching consequences to every globally operating company, influencing their choices on IT infrastructure.
Video distribution will continue to be an infrastructure and data driver and must be watched closely.
“Video distribution will continue to be an infrastructure and data driver and must be watched closely.”
It is amazing to see how video distribution has changed in just a few years: from cutting the cord of traditional cable or satellite provider subscriptions, to watching individual streamers like Facebook Live, or the emergence of eSports and watching others play video games. Our phones and tablets have become a primary video consumption vehicle.
Video drives a significant portion of the world’s bandwidth. Video is estimated as being at least 70% of network traffic today, and likely growing significantly in the future. It’s no wonder that video is such a hot topic. Much of the discussion of the FCC Net Neutrality argument is around the infrastructure cost of providing these high bandwidth video streams to users while not being able to capture the traditional profits on these streams.
The massive presence of video will see network providers demanding storage/buffers, both CDN storage in the network as well as local storage in gateways and devices to help manage its fast paced growth.
Stefaan Vervaet, Sr. Director Solutions Marketing, Data Center Systems
In 2017, we surveyed 200 of our customers and others and they told us that their unstructured data is growing between 40-60% per year, and this is faster growth than any other business data they have. They identified top growing data types as being rich media – audio, video, images, and research data – all of which are non-text data types. They also identified IoT data as being a growing part of their unstructured data. This is leading to an explosion of unstructured data due the increased size and volume of these data types. Survey respondents also reported that the key driver in their choice of data storage architectures is whether the architecture can enable them to get better insight into their data, and run analytics across their data.
“The old paradigm of data becoming less valuable over time will be shifted 180 degrees”
What this means for data lifecycles is that they will be getting longer and longer. The old paradigm of data becoming less valuable over time will be shifted 180 degrees. Old data, when aggregated with new data, will be the most valuable data for machine learning and AI based analytics. Infrastructure needs to be an enabler for this new “data forever” paradigm. We predict that infrastructure will be eliminated that does not enable rapid time to insight. This means getting critical data off of tape and out of backup formats so that it can be accessed readily. It also means aggregating large amounts of unstructured data into large data lakes to enable analytics.
Linda Zhou, Technology Alliances and HPC Market Development
Every sector is seeing a dramatic growth in data. But for life sciences this challenge has unfathomable proportions. For example, mammogram imaging is moving from 2D to 3D. We’re going to see that data grow from several megabytes to multiple gigabytes per image. This is just one example of many. All imaging sensors, whether MRI, microscopes, or cameras, are moving to higher resolution sensors and will be producing an incredible amount of data.
“There is no longer a ‘start small’ attitude”
This leads to a few changes in how data is handled. The first is that the current type of storage and workflows are no longer viable at such a scale. All aspects of infrastructure and architecture are being reexamined to support the large growth of data. Second, there is no longer a ‘start small’ attitude. Most organizations I speak with are thinking about how to start medium, and by medium I mean two petabytes and they expect to grow to ten petabytes over the next three years. The last change is the drive towards open science. Open science is about sharing data between researchers and institutions to create a larger pool of raw data to draw from. For machine learning and AI, the bigger the data pool, the more precise the results; accuracy is dependent on both volume and variety.
Adam Robert, Fellow
Analytical operations on Big Data will be more and more prevalent demanding new read-optimized infrastructure. As a result we’ll see the accepted endurance of flash devices for analytics in Big Data decrease as read analytics becomes a larger portion of the data center workload. Flash media with lower write endurance opens the door for denser, less expensive flash devices in places where hard disk drives may have previously dominated, opening up new possibilities for data insights and making it viable for more organizations.
Forward-Looking Statements Certain blog and other posts on this website may contain forward-looking statements, including statements relating to expectations for our product portfolio, the market for our products, and the capacities, capabilities and applications of our products. These forward-looking statements are subject to risks and uncertainties that could cause actual results to differ materially from those expressed in the forward-looking statements, including development challenges or delays, changes in markets, demand, global economic conditions and other risks and uncertainties listed in Western Digital Corporation’s most recent quarterly and annual reports filed with the Securities and Exchange Commission, to which your attention is directed. Readers are cautioned not to place undue reliance on these forward-looking statements and we undertake no obligation to update these forward-looking statements to reflect subsequent events or circumstances