Tuesday, May 17, 2011

Digital Preservation Series, Part 3 - There Are Solutions to This Problem

There are many methods being used to mitigate the digital preservation threats I mentioned in my last post. I’m going to talk about some of them here, but the information in this post isn’t comprehensive…it’s just a sample of the actions we’re taking at Dartmouth College Library to preserve our digital collections, along with ideas about similar actions that can be taken to preserve personal files.

File Formats
Since the threat of obsolescence is always a concern, one of the first digital preservation strategies is to create files in preservation-friendly formats, or if we receive them in other formats, to convert them whenever possible. What makes a file format preservation-friendly? The most important characteristics of a file format are that it:

  • is a non-proprietary, open standard format, meaning that all the technical information about the format is published and maintained by a standards organization
  • is uncompressed or, if that isn’t possible, uses lossless compression
  • is commonly used (and thus more likely to be supported and documented)
  • allows embedded metadata, when possible
  • works with a wide variety of hardware and software configurations

Examples of file formats we prefer include XML for text and TIFF or JPEG2000 for images.

For personal digital materials, consider using file formats such as PNG and JPEG for images, PDF for text, and OpenDocument formats for office-related documents such as formatted text, charts, and presentations. See the Library of Congress Sustainability of Digital Formats page for more information on file formats.

We also add metadata to our digital files, which ensures that we will be able to retrieve the information when we want it, and that we will know how to use it in the future. This metadata takes many forms, but some of the most vital pieces we include are:

  • What the content and context of the object is, such as the title, author, date of creation, and information about the source material if the item was digitized from an analog object such as a book (referred to as descriptive metadata)
  • Information about how the object was created, such as what file format it is and what software was used to create it (known as technical metadata)
  • How the object relates to other digital objects, for example, how a set of images should be ordered to correctly form the sequential pages of a book (structural metadata)
  • Who holds copyright for the object and how the object may be legally used (aptly named rights metadata)
  • And then there’s a category of information that includes results from validation (see preservation management activities below), the chain of custody of the digital object (so we know it hasn’t been altered, or if it has been altered, who did it and why), and identification numbers for the object that tell external systems (such as an online catalog) what the object is. This information is sometimes referred to as administrative or preservation metadata, although all of the metadata in this list is important for preservation!

When creating metadata, we always use standardized formats and terminology so that software developers can write programs that can interact with the digital objects based on a common language. Examples of standards we follow include the PREMIS and METS schemas, and the XML format.

Keep in mind that metadata use isn’t limited to library collections! You can add metadata to many of your personal digital materials. A file name, for example, is a type of metadata; if used correctly, the file name can provide a lot of information about the file without even opening it. It can often tell you the name or content description, the date it was created or edited, and the extension tells you what the file format is. But many files, particularly images, can also be given tags that contain some of this information as well. When tagging photographs, important things to include might be the date and location the picture was taken, who took the picture, and who or what is in the picture. Most cameras also embed some valuable technical information, such as the type of camera, color scheme, and other specifications about the image. Newer digital cameras can even embed the date, time, and location metadata if they are set up to do so.

There are so many digital preservation actions that I’m going to have to continue them in another post. Next time, I’ll talk about storage methods and ongoing maintenance activities that are necessary to keep digital resources from degrading. Come back soon for more information!

Written by Helen Bailey

