Next week at AWS re:Invent, we’ll be showcasing how new hybrid workflows are changing the economics of data for analytics and backup. In our booth, you’ll be able to see an ecosystem of partners that enable a hybrid cloud redefined and allow organizations to extract more value from their data.
I had a chance to speak with Brendan Wolfe (@bgwolfe), director of product management at Primary Data, about their DataSphere software platform, challenges in customers’ cloud journey and exciting developments in metadata, automation and machine learning.
You can catch Brendan and Primary Data in our booth #127 at re:Invent – here’s an overview of what we’ll be showcasing.
1. Who is Primary Data and what does DataSphere do?
Primary Data is the latest venture by the former Fusion-io™ leadership team of Steve Wozniak, David Flynn, Lance Smith, and Rick White. Fusion’s ioMemory™ products simplified performance storage by offering a magnitude faster performance on a fraction of the hardware of performance disk-based systems, but it exacerbated data management problems, as data became trapped on new flash storage silos. With cloud storage on the rise to provide the inexpensive capacity, there are now more silos to manage than ever. Recognizing that the waste and complexity these storage silos created was only going to grow, the team set out to solve the problem of intelligently managing data across different types of storage.
By leveraging intelligent metadata management, machine learning, data virtualization, and an open architecture, the DataSphere software platform was designed to finally automate data management at petabyte scale. DataSphere connects heterogeneous storage systems in a single namespace, eliminating the storage silo problem. It enables data to move non-disruptively and automatically between different storage types, according to IT-defined objectives for data. This automates tedious data management and data migration tasks and delivers substantial savings by enabling IT to do more with existing storage investments.
2. What challenges do you see companies facing when adopting a cloud strategy?
We recently conducted a survey that asked IT professionals this question, and their responses line up with what we see with customers. Nearly 35% of survey respondents noted a key challenge to adoption was the difficulty of integrating object storage into existing infrastructure. Integration is an issue because each cloud or object store represents a separate storage silo that must be managed separately. 30% of survey respondents said that the need to reconfigure or reformat applications to use object data was also a hindrance.
There is significant complexity at the root of these adoption challenges. Most enterprise data is stored on NAS file systems and accessed over NFS or SMB protocols. In fact, many enterprises already have a mix of these protocols in place throughout their infrastructure. That means that today, most enterprise applications don’t speak the Amazon S3™ protocol. DataSphere fixes this through the power of data virtualization, delivering the ability to integrate NFS, SMB and S3 protocols can be integrated in a global namespace and making heterogeneous data stores be made simultaneously available to all applications. DataSphere writes data to object storage as files, which increases the granularity, visibility, and control admins have when managing data across primary, object, and public cloud storage. Importantly, since DataSphere manages data at the file level, IT can even rehydrate a single file back to higher performance storage if needed again, rather than paying bandwidth premiums to move an entire LUN back on premises.
3. How does Primary Data and HGST make cloud adoption easier/more affordable?
Adding DataSphere to your environment enables you to seamlessly archive cold data to HGST ActiveScale™ object storage. Data on ActiveScale is still visible and accessible as files that can be automatically retrieved should applications need data again—without the need to modify apps to use object storage. Moving data off primary storage and onto ActiveScale will improve the performance and utilization of an enterprise’s primary storage investments.
The HGST ActiveScale system is an innovative object storage solution offering superior scalability, durability and efficiency. As a turnkey system, the challenges of architecting and integration are removed. It is up and running quickly – put it in place, connect the power, configure the network and it’s online, presenting an Amazon S3-compliant object interface to DataSphere.
4. How do metadata, automation, and machine learning work together and where do you see these capabilities evolving?
Metadata analytics are the key to knowing what’s going on with your data. You wouldn’t buy a storage system without knowing specs such as read and write speeds, reliability and its performance. Metadata helps you know what data actually needs and the capabilities your different storage resources can provide.
Metadata analytics can now tell us how large a file is, when it was last accessed, when it was last changed, who accessed it, and more. This information can be used by machine learning algorithms to continually optimize environments. For example, machine learning software can come to understand cyclic events, such as end of quarter reporting. Data automation can move data onto performance storage before the end of quarter, and back to more cost-effective data stores once the reporting is complete. This can even automate the response to unexpected events. As unexpected workload spikes slow application response times, DataSphere can proactively rebalance data before end users are ever aware of the problem. As these capabilities evolve, machine learning will come to recognize the warning signs of these “unexpected” events, preventing the majority of them from occurring altogether.
5. How is data migration different from storage migration in the context of the hybrid cloud?
Data migration is quite different from storage migration. Cloud storage migration is a periodic event to move (often retired) data into the cloud. This is a bulk, one-size-fits-all migration that doesn’t take actual data activity into consideration, so there is a risk that a file that is still needed might get moved out along with the colder data, creating a performance problem.
Cloud data migration examines real data activity and moves data transparently to on-premises object or public cloud storage according to how data is being accessed, in accordance with IT-defined policies. For example, using cloud data migration, IT could archive data that has not been accessed in a month to ActiveScale object storage, and split archival of data older than a year between ActiveScale for data subject to regulatory compliance and archival of all other data to a public cloud.
Headed to AWS re:Invent? Make sure to catch Primary Data in our booth #127 – learn more here about what we’ll be showcasing and how we’re helping redefine hybrid cloud.
Certain blog and other posts on this website may contain forward-looking statements, including statements relating to expectations for our product portfolio, the market for our products, product development efforts, and the capacities, capabilities and applications of our products. These forward-looking statements are subject to risks and uncertainties that could cause actual results to differ materially from those expressed in the forward-looking statements, including development challenges or delays, supply chain and logistics issues, changes in markets, demand, global economic conditions and other risks and uncertainties listed in Western Digital Corporation’s most recent quarterly and annual reports filed with the Securities and Exchange Commission, to which your attention is directed. Readers are cautioned not to place undue reliance on these forward-looking statements and we undertake no obligation to update these forward-looking statements to reflect subsequent events or circumstances.