Digital Preservation 2020 – Start Your Metadata Thinking

December 18, 2017 5 min read Data

Media companies are beginning to see that the value of films and other productions live well beyond the date of creation. Take for example the Montreux Jazz Festival, who recently digitized over 50 years of music to deliver their content in new formats, why animation studio Illumination Mac Guff keeps all digital assets online after the film is complete, or how Sundance Institute preserves its history.

Digital Preservation

Digital preservation is not only about ensuring that data or content will survive hardware or archive failure, but it is also about keeping content alive. What I mean by that is ensuring that content will remain accurate and usable over time, regardless of any changes in media and technologies.

We are seeing incredible advances in media-rich formats and standards as well as augmented and virtual reality. With the fast-paced advances in technology and the massive growth of data, digital preservation is more complex than ever!

I recently sat down with Linda Tadic of Digital Bedrock, to learn more about the current and future state of digital preservation, and what studios, creative houses, archives and other media organizations need to know to about preserving and monetizing their content.

You can watch our interview or read my key takeaways below:

It All Starts With a Backup Strategy

Every organization needs a backup strategy to ensure the safekeeping of assets. Your archives may be stored on multiple mediums (e.g. tape, hard disk drives, cloud) and your assets may live in different physical locations (on-premises, tape warehouse, colocation facility, public cloud, etc.). Data is likely tested regularly to make sure it has not become corrupt over time (through checksums and fixity checks) and most strategies include two or three redundant copies (possibly at different geographic locations), so data can be recovered should hardware media fail or be corrupt.

Assume Hardware and Software Obsolescence

The second aspect of preservation is what differentiates backup from archiving or preservation. Preservation is not just about storing the files but also about ensuring you can use the file at any future date. The file format in which you keep your data is a primary factor in one’s ability to use it in the future. Fundamentally, you should assume that everything from hardware to software will eventually be obsolete.

A Complex Web of Relationship Dependencies

Here is where things get really interesting—both software and files have relationship dependencies. Changing one file may affect many projects that that file may be part of, and can even render them useless. For example, a new format comes out as a standard that people should be using. Yet when you change a file, other files dependent on that original file can no longer find it. That means you’ve destroyed a renderable product because you changed just one part of it. Now multiply this by hundreds of projects, different edits for different audiences, languages etc. and you start to understand how complex digital preservation is for large studios.

The Future Depends on Metadata

Going digital means you will have a lot of metadata to worry about. What metadata you create is key to how you will be able to leverage assets in the future. You need to not only know what is in your files, but how files are created, what software dependencies are involved in being able to play that content, if they have been transcoded or migrated (which will unquestionably happen over time).

Think about Virtual Reality (VR). VR requires a ton of data and has incredible amounts of files that are all interrelated. Currently, everyone is focused on what the technology can do, but for archiving you need to also think about how you can recreate, render or preserve it. Creating the technical metadata about the characteristics of each file, relationships, the environment required for playback etc. will be very difficult to do manually at such scale. So the automation of metadata is something that’s keeping the industry very busy.

The good news is that there are new standards being introduced that are very helpful in this process, like SMPTE® Archive eXchange Format (AXF). AXF is a file container that can encapsulate any number and type of files in a fully self-contained and self-describing package. This allows interoperability amongst disparate content storage systems and ensures the long–term availability of content no matter how storage or file system technology evolves.

Object Storage and Metadata

Object storage has a similar type of container system that allows you to keep related files together in a storage environment. It’s great for archiving because object storage inherently bundles metadata in objects (and you have flexibility as far as what metadata you want stored). Furthermore, objects can live in a single namespace even across geo-locations, making working with teams around the globe far more streamlined. It’s an extremely reliable and lower cost way of bringing large scale archives online so you can do something with you preserved data – such repurposing, licensing or other monetizing of content.

Something interesting about object storage, in comparison to file and block storage, is that an object is immutable. You can’t change it, which is a great thing for preservation. However, it’s a problem when you want change the data around it. The idea of preserving content and active archives is that you’ll want to use these files in the future; if you do that, it will certainly have more data created around the file. So an interesting development will be the possibility of mutable metadata around immutable objects.

Digital Preservation 2020

There are new economics for data. Everyone needs to deal with the radical growth of data, relatively shrinking storage budgets and the value that data and content can bring through repurposing, licensing or ML and AI initiatives. Online, digital preservation of assets is the foundation to being able to take advantage of data and create new opportunities.