Showing posts with label digital curation. Show all posts
Showing posts with label digital curation. Show all posts

Tuesday, June 23, 2015

Baker Library in the Dartmouth College Photographic Files Collection

During the month of May, close to 1500 photographs of Baker Library were added to the Dartmouth College Photographic Files Collection. Included are many photos of the Baker Library building, interior and exterior, including its construction, as well as people that have worked here over many years. Here are some samples:











View a much larger selection here.

The Dartmouth College Photographic Files project began in early 2012 and is part of the Dartmouth Digital Collections. The project's goal is to make over 80,000 photographs stored in file cabinets in Rauner Special Collections available online. Images date from the early years of photography (ca. 1850s) to the present and include images of nearly all aspects of Dartmouth College life. To date there are over 36,000 photos from the collection online. We add approximetly 1,000 photographs to the collection every month. We are working through the photographs alphabetically and have reached the letter "M". See additional photographs of Feldbery Library; Dana Biomedical Library and Kresge Library.

If you have questions about the Photographic Files Collection contact Rauner Special Collections If you have questions about the ditital imaging of the collection contact William B. Ghezzi or Ryland Ianelli in the Digial Production Unit.

Written by William B. Ghezzi























Tuesday, March 17, 2015

File Validation Woes

Over the last few months I have been preparing and ingesting the master TIFF files for the Photo Files collection into our local repository system for safe keeping. The first step is to package the files using the BagIt specification. BagIt was developed by the Library of Congress and the California Digital Library as a way to package files along with some basic metadata that can be used to validate the bags contents. It's the digital equivalent of putting a bunch of things in a box, along with a list of the box’s contents and a unique identifier that can be used to identify each item. Since our Photo Files collection is enormous (so far I’ve deposited over 45,000 images, and we’re not even half way through the collection), I break the bags into manageable chunks for uploading and processing in our repository.

Once a bag is uploaded onto the server, it is validated using the BagIt tool. This is a programmatic way of checking that all the files are still exactly as they should be, and no file has been altered or gone missing or snuck in on the sly. Finally, the contents of the bags are run through the File Information Tool Set, or FITS. FITS brings together a bunch of open-source tools that identify file types, check to see if those files are valid, and extract technical metadata. So, for instance, when I deposit a bag from the Photo Files collection, FITS produces a report that says “These files are TIFFs! These TIFFs are well formed and valid! Here’s some technical info you might want to have around!”, only with less exclamation marks:

Sample FITS report

So, this process has been going along just swimmingly until a few weeks ago. Like I said, I’d made it through about 45,000 images, and then suddenly, BAM! an error report for every single image:

page-masters/Icon1647-0875-0000010A.tif is not valid: "Type mismatch for tag 700; expecting 1, saw 7"


All about the Tagged Information File Format (TIFF):

The first thing I discovered was that this error message had something to do with the T part of the TIFF. The TIFF file format has what’s called a header that uses tags to describe the content of the file. These tags, and the information in them, can be manipulated using various types of tools. The capture software we use to create our master images automatically inserts certain tags. As part of our process, we add additional information into the headers of our TIFFs. This is called embedded metadata, or information about the file that is part of the file itself.

The problem with these images was the 700 tag. From the Library of Congress’ super useful guide to TIFF tags I learned that this tag has something to do the XMP metadata within the file. XMP is a data model for structuring embedded metadata. Data models for metadata help standardize how metadata is stored. For instance, I could edit an image to say “Author: Jane Doe”, while someone else might edit it to say “Photographer: Jane Doe” and we could both mean the same thing. A data model would say, “Ok, everyone, we’re going to use the term Creator.” This makes it easier for both humans and computers to make use of embedded metadata, making digital objects more discoverable and easier to maintain.

So, now I knew that there was a problem with the metadata we were embedding in the files. Something about a 1 and a 7? Deep inside the Photoshop user forums, I found that I was not the first one to run across this problem. These numbers refer to the type field in the XMP, with 1 meaning “byte” and 7 meaning “unknown”. So these files said "unknown" when they should have said “byte”, right? Well, not really. According to David Franzen (Employee)’s response in the user forum, both the 1 and the 7 were valid values. So why was I getting this error message?

JHOVE and FITS:

As mentioned above, FITS packages together a number of tools. The tool that was giving this error message was Jhove, or JSTOR/Harvard Object Validation Environment. According to wikipedia, Jhove tells us whether or not objects are “well-formed (consistent with the basic requirements of the format) and valid (generally signifying internal consistency).” The version of Jhove that is packaged in FITS says that in order for a TIFF to be well formed, tag 700 needs to have a “1”, and anything else is invalid. But it also seems that the "7" is also a valid value for this tag. So, why is there this discrepancy in what makes a valid TIFF? Well, it turns out that when Jhove was first developed, the TIFF format specifications weren’t exactly easy to decipher. The TIFF specifications encoded in the tool were based on confusing, incomplete and scattered documentation. When others started getting the same error message as I got, they turned to Adobe for clarification. As a result, Jhove’s code was updated in version 1.8 to accept both “byte” and “unknown” as valid values in the 700 tag.

However, the updated version of Jhove didn’t make its way into FITS. Apparently, there were some other changes to Jhove 1.8 that would make integrating the newer version into FITS a rather large job. Making the necessary changes to FITS to accept newer versions of Jhove currently isn’t a priority for the FITS developers.

The Real Culprit:

Now that I knew what was causing the error message, I circled back to the big question- why now? The first 45,000 files had been just fine. What changed? In discussion with our digital production team, I learned that there had been a significant change to the production workflow, specifically in how they were adding embedded metadata. What before had been a time consuming process was greatly simplified by using Adobe Bridge to quality check images and add metadata. In researching this error message, I had seen people mention Bridge as the culprit in changing the 700 tag.


                          

Testing embedded metadata settings:

To be sure, I decided to play around with the settings in both our capture software and Bridge to see if I could get a different result. I created a number of test images with different metadata settings using our capture software, then ran these through FITS. All checked out okay. Next, I played around with the metadata setting in Bridge, and made changes to the embedded metadata in my test files. I ran the files through FITS again, and all failed to validate. No matter what settings I used in Bridge, the 700 tag was changed.

So Now What?

Now that we knew what was causing the error, there were a number of different approaches we could take. To find out what we did, stay tuned for my next blog post...



Written by Jenny Mullins

Tuesday, December 9, 2014

Merging Images in Photoshop, Part One

One of the most common problems in digitization is how to deal with an image that is too big for your camera or scanner. The simplest solution is to photograph or scan the object in separate pieces, then merge those pieces together, however this can present its own set of problems to those unfamiliar with imaging software.

In this post I will be describing my own method for merging together images. There are many other ways to accomplish these tasks, and if you have a way that works for you, I encourage you to keep using it, but also be aware of its potential pitfalls. The main benefits of my own method are the ability to quality-check your work as you go, and make simple non-destructive edits that can be changed or reversed as needed. Also, for simplicity’s sake, I will be referring to my own Mac OS based workflow for menus and keyboard shortcuts.


Here is the whole image that we’re trying to assemble, and for whatever reason, it’s been captured in two side-by-side pieces in the standard .tiff format. It is crucially important, when capturing, to make sure there is overlap between the captures. This is going to help us check how well-aligned our merging is, so the more overlap the better.

Notice how each side is wider than half of the image

Now that we’ve got our two images, open both in Adobe Photoshop and choose whichever one you want to start working on. I usually go from left to right for simplicity’s sake, so here I will be starting on the left side of the image.

In Photoshop, select the Image drop-down file menu, and select “Canvas Size…” (or use the keyboard shortcut: option+command+C). Click on the canvas width field, and double it. In the “Anchor” field, select the leftmost column of the grid so that Photoshop knows where to put the empty space.


You should be left with an image like this:


It will end up a little wider than is necessary, but it’ll be easier to trim it down after the fact than to add more space. This will now become our “master” file. Do a “Save As” at this point and designate it as such.

Next, go to the second image that we are going to merge into the master (in this case, the right side image). The next step should be familiar to most computer users: select all of the image (command+A), and copy it to the clipboard (command+C). Then go back to the master file and use paste (command+V) to add it into the image.


If you’re paying attention, you’ll obviously notice that this new image is not in the correct position. However, by looking at the Layers panel on the right side of Photoshop you’ll see that the new image is on its own layer, resting on top of the background (if you do not see the Layers panel, select the “Window” drop-down menu and enable “Layers” there). Thus we can edit it without disturbing the original “bottom” layer.


Now, with the top layer selected, click on the “Opacity” field in the Layers panel and set it to 40%. This will make the top layer semi-transparent and allow us to line it up with the bottom layer.

Then, with the Move tool selected (V), begin moving the top layer around and trying to find where it lines up. Look for any solid shapes that are shared by both images, or where the borders intersect. Letterforms provide nice clear and easily-spottable shapes, which is why I have used them in this example, but it can be anything so long as it’s shared by both images.


We’re getting there, but it’s obviously still not right. At this point, find an area of overlap and zoom in closely. Then, with both the top layer and the move tool selected, simply “nudge” the top layer into place using the arrow keys. The arrow keys will only move the layer one pixel at a time, so obviously this is for the finest level of adjustments.

Almost...

Nailed it!

Now for the final steps! In the layers panel, set the top layer’s opacity back to 100%. Then inspect the images along the borders, making sure that it looks seamless. While checking for quality be sure to zoom in and out.


At this point you can crop the image down to its original size, and it will be ready to go. However, one important piece to remember is that layered .tiffs, in addition to simply being larger files, are also not commonly supported by web or other software. What I like to do at this point is to save the “Master” file with both layers, and then create a new version for common use. The common use version will get flattened (Layer -> Flatten Image) then do a Save As in whatever format is required such as .jpeg or .pdf. This way, if any changes need to be made, we can always go back to the Master version.

And there you have it! A nice, seamless image. In the next post in this series, I will go into more detail for dealing with other problems, such as skew and mismatched backgrounds or details.

Written by Ryland Ianelli

Tuesday, October 28, 2014

Dartmouth Library Joins the National Digital Stewardship Alliance




We are very excited to announce that Dartmouth College Library has joined the National Digital Stewardship Alliance (NDSA), a consortium of organizations that are committed to the long-term preservation of digital information. The mission of the NDSA is to establish, maintain, and advance the capacity to preserve our nation's digital resources for the benefit of present and future generations. Members include universities, consortia, professional societies, commercial businesses, professional associations, and government agencies at the federal, state, and local level.

The NDSA is organized into 5 working groups: Content, Standards and Practices, Infrastructure, Innovation, and Outreach. Each group develops and executes a series of projects, which have included:

·         Developing the Levels of Preservation, a set of guidelines on tiered levels of digital preservation (Infrastructure WG)

·         Publishing a report on "Issues in the Appraisal and Selection of Geospatial Data"  (Content WG)

·         Creating Digital Preservation in a Box, a toolkit to support outreach activities that introduce the basic concepts of preserving digital information (Outreach WG)

·         Recognizing innovation in the community through the NDSA Innovation Awards (Innovation WG)

I am very excited to join the Standards and Practices Working Group, which works to "facilitate a community-wide understanding of the role and benefit of standards in digital preservation and how to use them effectively to ensure durable and usable collections." Projects undertaken by this group include a report on "The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions" and a recent survey assessing stumbling blocks for video preservation.

Written by Jenny Mullins


Tuesday, October 14, 2014

Dartmouth at the Digital Directions 2014 Conference

Image from the blog PDXretro.com

This past July I had the great opportunity to attend the Northeast Document Conservation Center’s Digital Directions 2014 conference. In a lucky turn, this year’s conference was held in Portland, Oregon, home of my alma mater, Reed College. In addition to reexperiencing the highlights of one of my favorite American cities, I was able to meet and engage with many people doing amazing work in digital collections across the country and beyond.



The conference covered a fascinating diversity of topics, from high-level project management and planning to specific examples of workflows and equipment setups. One of the first things impressed upon me was the fascinating diversity of digitization efforts occurring across the world. As the demand for digital content continues to expand, many institutions are rushing to fill that need. Because of this, it can often seem that no two institutions’ digital programs are the same, or even particularly similar.

To its credit, the Digital Directions did a phenomenal job accounting for these various setups. The three days were jam-packed with a fascinating variety of discussion topics and presentations. The first day consisted of mostly big-picture type talks. We discussed the interplay between digital preservation (maintenance of access to digital content) and digital curation (adding value to digital content), as well as how to craft each institution’s best practices and standards according to their needs. The day was wrapped up with an impressively no-nonsense discussion about rights and responsibilities from a legal perspective by Peter Hirtle, followed by a lovely meet-and-greet at the Portland Art Museum.

The following days covered a wide variety of topics, including a fascinating section about audio and video digitization (an area unfortunately outside my range of experience). However, it soon became apparent that the challenges faced by those audio and video digitization teams were remarkably similar to my own in the world of object and document reproduction. Many digitization projects face the same fundamental roadblocks: time, equipment, resources, access, and storage.
Image from NEDCC's twitter account

While the specifics varied, these fundamental issues could not help but make themselves apparent. The relative merits of, say, cloud storage (to pick a random example), can be endlessly debated among digital librarians, and indeed I’d doubt there ever will be a definitive final-word on this topic. But the crucial takeaway must be a willingness to engage with these issues, understanding the risks and drawbacks inherent in each option so that they can be minimized, or at the very least understood fully so that we may deal with them more effectively in the future. Among the many useful things I learned at Digital Directions 2014, perhaps the most important one was that my own peers are an incredible resource, both within Dartmouth and world-wide. By learning through their experiences and sharing my own, I hope to do my part to keep the Dartmouth Library’s Digital Collection growing and improving well into the future.

Written by Ryland Ianelli

Tuesday, September 30, 2014

100-ish Days of Digital Preservation

Hello, there. It's been a little over 100 days since I started as Dartmouth College Library's first Digital Preservation Librarian. I've been working closely with staff in many departments to define my role and work out how best to ensure long term access to the Library's digital content. Here are some of the things that I've been up to:

  • Maxed out our master file server space.
  • Learned about awesome projects and connected with colleagues at Digital Preservation 2014.
  • Made some head-way into assessing our e-resource preservation strategies.
  • Used BagIt to package 45,000 files totaling 2413 GB for long-term storage (see above re: maxing out server space).
  • Started digging into PREMIS .
  • Learned to harness the power of Twitter for professional research #digipres .
  • Started brainstorming strategies for preserving analog and born-digital a/v content.
  • Dipped my toes into web and database preservation in response to a faculty inquiry.
  • Got really excited about sustainability and digital humanities projects.
Digital Preservation Brainstorming!

 I’m looking forward to my role in the Library continuing to evolve and grow over time. As these and other projects develop, I will tell you all about them here. Stay tuned for the next 100-ish days of Digital Preservation!


Written by Jennifer Mullins

Tuesday, October 8, 2013

Manage Your Photos with Adobe's Bridge and Lightroom

If you decide to digitize a collection of images, such as a photo album or a slideshow, there are literally thousands of different software solutions to go about organizing them. Many of these can come bundled with scanning or photo software, and often those can offer a relatively low-cost and hassle-free solution to keeping track of your images.

However, if you want to ensure the maximum compatibility and usability of your images over the long term, there are more advanced options available through Adobe's very popular line of professional software. The complete Creative Suite (and the newly-offered Creative Cloud) feature an almost overwhelming amount of tools, from film editing to website design, along with the nigh-essential Photoshop.

Two of these programs are designed for the main purpose of organizing and viewing large collections of images; something you might imagine is essential to the kinds of digitization projects taken on by the Dartmouth College Library's Digital Production Unit. Adobe Bridge and Adobe Lightroom each offer distinct advantages over the other, and while they may share some purposes, it can save you a lot of time to know which one suits your project best.

Adobe Bridge is, as its name implies, an excellent way to organize many different kinds of media, "bridging" many formats. It is designed to play nicely with all other Adobe programs, and offers an excellent alternative to the standard Mac OS Finder, or Windows Explorer when it comes to browsing collections. The interface can be altered modularly to suit a project, meaning you can re-size, add or delete tools from the main screen with great ease. Additionally, you can add or alter image metadata, and do batch file renaming. Its flexibility is its best selling point, making it a helpful addition to any Adobe-based workflow.

Lightroom, on the other hand, is designed specifically for photographs. In addition to letting users browse through collections of photographs, Lightoom offers far more tools for photo-editing than Bridge, and presents them in a way that is familiar for professional or amateur photographers. While at first blush this would appear similar to running a combination of Bridge and Photoshop, it actually has a few interesting tricks of its own.

The most important thing about Lightroom is to think of it within the context of a photographer's studio. The program is designed to take raw camera files (.dng is the most common format) and apply various changes to it without altering the originals. The entire editing action takes place within the Lightroom environment, so you are never in danger of losing data. You can think of your raw camera files as digital negatives, to be used and reused to create different print files. Lightroom easily stores setting data, allowing you to export as many kinds of derivative files as are needed. However, it is designed around these features. If you want to make actual changes to a master document you'll have to use Photoshop.

Between these two programs we can respond to all kinds of challenges in the Digital Production Unit, organizing and reworking files in the manner best suited to the project.

Written by Ryland Ianelli 

Tuesday, July 24, 2012

Winning the Game of Digital Curation

A few weeks ago, the Preservation Services team used one of our regular department meetings to play a board game. Yes, that's right...we played a board game at work! But lest you think we're just a bunch of slackers, let me assure you that this particular game was special, and highly relevant to our jobs. The game we played was Curate: The Digital Curator Game.


This game was created by Digital Curator Vocational Education Europe, or DigCurV for short. The game is designed to help people learn about and discuss the challenges and strategies involved in digital curation, while also having a lot of fun! It includes plenty of pertinent questions exploring issues such as staffing, funding, collaboration, and training.

The "game" part of the game is really just a ruse...a way to get people interested in having the digital curation discussion, and it worked. We all got really into it, and had a lively conversation. Some of the topics that we found especially useful were: project and workflow planning, skills needed for staff involved in curation, and listing external resources for gathering more information about digital curation.


The game is free to download from DigCurv, they just require you to register as a network member. Part of the game includes recording discussion points on a record sheet, and DigCurV’s only request is that anyone who plays the game submit these sheets anonymously, to help them better understand how the game is used and whether it’s helpful. It was definitely helpful for us, and we thank DigCurV for providing such an excellent resource for sharing and learning about digital curation!

Written by Helen Bailey.

Tuesday, June 19, 2012

Planning and Building a Digital Collections Program, May 10, 2012

I recently had the pleasure of working with a colleague from Amherst College to organize a regional forum on "Planning and Building a Digital Collections Program". Kelcy Shepherd and I organized this event through the NorthEast Regional Computing Program, better known as NERCOMP. Our goal was to bring together speakers on a variety of topics related to creating digital collections in a library or archive setting. The forum included four presenters:
  • Dartmouth College Library's own David Seaman, Associate Librarian for Information Management, who spoke about our Digital Library Program Plan and the process we’ve gone through to develop our digital collections infrastructure over the past several years.
  • David Mathews, Partner at The Image Collective, who gave a detailed presentation on the important technical considerations for digital imaging.
  • Nancy McGovern, Head of Curation and Preservation Services at MIT Libraries, who covered the basic components of digital preservation planning. This talk was a very abbreviated version of the ICPSR's five-day Digital Preservation Management Workshop, which builds on the Digital Preservation Management Tutorial found here.
  • Anne Sauer, Director and University Archivist at Tufts University, who talked about the challenges and strategies involved in advocating for digital collections funding within a larger campus environment.
All of the presentations were excellent, and Kelcy and I had a great time organizing the event. The presenters' slides can be found here (some slides are not yet available, but will be soon). Many thanks to all the presenters and participants, and to NERCOMP for hosting the forum!

Written by Helen Bailey.

Tuesday, November 1, 2011

Dartmouth College Library Digital Preservation Policy

I am very pleased to announce the creation of the Dartmouth College Library Digital Preservation Policy. This policy is a critical first step in the Library’s goal of ensuring long-term access to digital resources. It identifies the scope of digital objects that the Library will commit to preserving long term, the principles that will guide preservation actions, and the strategies that will be implemented to ensure preservation.

The policy was heavily influenced by similar documents from the Wellcome Library, Columbia University Library, and Yale University Library, as well as the Digital Curation Centre, JISC, and the OAIS Reference Model.

Kudos to Helen Bailey, Preservation Specialist, who collaborated with me to develop the policy, guided it through numerous drafts, and tirelessly revised the document to reach this final version.

The policy may be viewed here.

Written by Barb Sagraves

Tuesday, July 19, 2011

DigCCurr Professional Institute

EAT MOR CHIKIN
DigCCurr* Professional Institute is an IMLS funded program of the School of Information Science at the University of North Carolina at Chapel Hill. For the past three years Professors Helen Tibbo and Cal Lee, along with other internationally recognized experts, have hosted a one week professional development class on the theory and practice of digitial curation for the digital object lifecycle.

I was a student in the May 2011 session and it was an amazing week!

There were about thirty of us representing a broad range of organizations (university, government, and business archives; academic libraries; museums; historical societies) and a wide range of responsibilities (digital managers, catalogers, archivists, preservation librarians, directors, access services librarians). Each day was filled with lectures, hands-on labs, and discussion.

    Some highlights:
  • Manfred Thaller gave an overview of the PLANETS software and a tool to stimulate digital aging. It was a lot of fun to see how far you could corrupt a file and still get a usable document -- and at the same time discover file formats that need very little change to be un-openable.
  • Nancy McGovern spoke about OAIS and the need for digital curation program development.
  • Seamus Ross walked us through the DRAMBORA audit tool.

At the end of the week we were assigned to develop a project based on what we had learned at DigCCurr and implement the project during the next six months. Come January 2012 we will meet again in Chapel Hill to report on our projects, celebrate our successes, and console one another on imperfect implementations.

My project is to use the DRAMBORA tool for a risk assessment of our digital preservation policy. I'll post my experiences with the tool as the work moves forward.

Two other items related to DigCCurr:

-One of the program goals is to develop a community of digital currators. To that end Professors Tibbo and Lee have developed the Digital Curation Exchange. The DCE is a web based community open to anyone interested in digital curation; DigCCurr uses it as a home base for students to share information, however, anyone may join. If you haven't visited the site take a few minutes and look it over.

-Our class attended a Durham Bulls Baseball game and I discovered a new fried food: potato chips on a stick.

*Pronounced dij-seeker

Written by Barb Sagraves

Tuesday, March 8, 2011

Digital Curation Conference

Last December I had the opportunity to attend the 6th International Digital Curation Conference in Chicago, IL. The conference was a super fantastic two day event. Reviewing my notes there were lots of excellent presentations -- a few that stood out were:

Robin Rice (presenter) - "Research Data Management Initiatives at the University of Edinburgh". Robin mentioned the concepts of high and low curation:

  • High curation would be labor intensive and require human intervention (metadata creation would be a good example).
  • Low curation would be automated (for example checksums or file format validation).
As Dartmouth College Library moves forward with digital preservation these terms and concepts will be helpful in our conversations.

Catherine Ward (presenter) - "Making Sense: Talking Data Management with Researchers". The "Incremental" project was designed to improve research data management within the institution by focusing on providing better advice, training, and support for researchers. It's a very common sense approach and worth referring to as the College designs a program to support data management.

I also participated in a pre-conference, Digital Curation 101 Lite. It was led by Sarah Jones, Martin Donnelly, and Joy Davidson and used the Digital Curation Centre lifecycle model as the basis for the course. It included lots of good advice about knowing your audience and being mindful of language that might scare them off (i.e. data curation).

These notes just scratch the surface of the conference. If you are interested in minding your data and want to learn more about digital curation, follow any of the links in this post. Mark your calendar, the next conference will be held in Bristol, England in December 2011.

Written by Barb Sagraves

Tuesday, February 15, 2011

Data management web page

Preservation Services now has a web page with a few useful links to understanding & writing data management plans. It connects to Research Computing's workshop on "Effective Data Management" as well as the College's Office of Sponsored Projects.

To view the page go to: Data Management