Showing posts with label TIFF. Show all posts
Showing posts with label TIFF. Show all posts

Tuesday, May 17, 2016

Image Resolution and You

Picking your resolution is perhaps the most important decision you'll make when digitizing photos, artwork, or any other kind of image. But it can also be confusing if you're not familiar with these terms and their meanings. This is a simple primer to help you choose the right resolution for your needs.

The terms DPI and PPI are both shorthand units for measuring an image's resolution. DPI stands for "dots per inch" and PPI, "pixels per inch." This means that DPI is technically a term for a printed object's resolution while PPI describes an image displayed on a screen, but in common use they are essentially interchangeable.

The system that these measurements describe is called Raster, and it's by far the most common in a modern digital setting. A raster image is essentially a mosaic, collecting dots of color called pixels in a tiny square grid to produce an overall image. The more pixels per inch, the more detailed the image. Simple enough, right? For an easy example, here is the same image at three different standard resolutions: 600dpi, 300dpi, and 72dpi (click to see at full size)


You can always lower the resolution of an image, but it's impossible to raise it, except in a simple multiplying sense. This is why all digital images look blurry if you zoom in far enough. You're making the pixels bigger, but you aren't adding any new information to them.

Another factor you will want to consider is your display resolution. Modern high-definition TVs will often give you this basic measurement, and while computers have a greater variety of resolutions they will generally fall under a few typical values. 480p means a screen is 480 pixels wide, and is considered "standard" definition. 720p is, of course, 720 pixels wide, and marks the beginning of "HD" standards. 1080p is probably the most commonly used HD resolution, and the cutting-edge "4K" resolution is a convenient shorthand for screens 3,840 pixels wide. The screen resolution will determine how "large any given image looks at full-resolution on the screen. If you try to stretch a 480p wide image across 1080 pixels, it will look bad.

While the ideas surrounding pixel resolution, display resolution, and print resolution are quite complicated, they can still be understood easily with a few guidelines. For most purposes you can create images using 3 different resolutions:

600ppi is what we at the Dartmouth Digital Library Program use as the standard for high quality "master" images. Although many scanners can go higher, the size of the file becomes very unwieldy at that point. My advice is to always start at at least 600. Better to have a high-quality image and not need it than to need it and not have it.

300ppi is a common resolution for a high-quality print. Unlike looking at a screen where the resolution can be shrunk or blown up, a printer is rigidly limited in the amount of detail it can put into any given area. While a particularly good printer may get higher resolutions, most will clock in around 300dpi. This lower resolution also makes transferring files for print easier. And of course, it's always useful to keep your higher-res files around in case you need to go back to them.

72ppi has become the most common display resolution on the internet. There are a few things to consider before simply converting your image into 72ppi. Look at your display, and understand what its resolution is. Then consider how "big" you want your image to look on the display. So, if you have a 1080p monitor and want an image that fills the whole screen, you'll want to change your ppi to 72, AND change your image width to 1080p at the same time, with the proportions locked.

Here we can see Photoshop's image size menu (Image -> Image Size), where the pixel width and resolution are changed while the proportions remain constrained.

This is often a confusing concept to grasp. The simplest way I can think of is: if you reduce a 300ppi image to 150ppi, but also double its size, it will essentially be the same image when you see it on your computer. But if you try to print that, it will be half as detailed by virtue of being twice as big.

Fortunately, you don't need to fully understand all of this in order to create and work with high-quality images. As long as you make sure your highest-quality 600ppi master versions are safely backed up, you can play around with these variables in Photoshop or any other imaging program until you meet your own needs. Understanding how screen resolution, print resolution, and image resolution work together is an ongoing process that changes with technology as well as peoples' needs. It's important to be consistent, especially so for an institution like ours, but it's equally important to know how to adapt to your own needs.

Written by Ryland Ianelli

Tuesday, March 17, 2015

File Validation Woes

Over the last few months I have been preparing and ingesting the master TIFF files for the Photo Files collection into our local repository system for safe keeping. The first step is to package the files using the BagIt specification. BagIt was developed by the Library of Congress and the California Digital Library as a way to package files along with some basic metadata that can be used to validate the bags contents. It's the digital equivalent of putting a bunch of things in a box, along with a list of the box’s contents and a unique identifier that can be used to identify each item. Since our Photo Files collection is enormous (so far I’ve deposited over 45,000 images, and we’re not even half way through the collection), I break the bags into manageable chunks for uploading and processing in our repository.

Once a bag is uploaded onto the server, it is validated using the BagIt tool. This is a programmatic way of checking that all the files are still exactly as they should be, and no file has been altered or gone missing or snuck in on the sly. Finally, the contents of the bags are run through the File Information Tool Set, or FITS. FITS brings together a bunch of open-source tools that identify file types, check to see if those files are valid, and extract technical metadata. So, for instance, when I deposit a bag from the Photo Files collection, FITS produces a report that says “These files are TIFFs! These TIFFs are well formed and valid! Here’s some technical info you might want to have around!”, only with less exclamation marks:

Sample FITS report

So, this process has been going along just swimmingly until a few weeks ago. Like I said, I’d made it through about 45,000 images, and then suddenly, BAM! an error report for every single image:

page-masters/Icon1647-0875-0000010A.tif is not valid: "Type mismatch for tag 700; expecting 1, saw 7"


All about the Tagged Information File Format (TIFF):

The first thing I discovered was that this error message had something to do with the T part of the TIFF. The TIFF file format has what’s called a header that uses tags to describe the content of the file. These tags, and the information in them, can be manipulated using various types of tools. The capture software we use to create our master images automatically inserts certain tags. As part of our process, we add additional information into the headers of our TIFFs. This is called embedded metadata, or information about the file that is part of the file itself.

The problem with these images was the 700 tag. From the Library of Congress’ super useful guide to TIFF tags I learned that this tag has something to do the XMP metadata within the file. XMP is a data model for structuring embedded metadata. Data models for metadata help standardize how metadata is stored. For instance, I could edit an image to say “Author: Jane Doe”, while someone else might edit it to say “Photographer: Jane Doe” and we could both mean the same thing. A data model would say, “Ok, everyone, we’re going to use the term Creator.” This makes it easier for both humans and computers to make use of embedded metadata, making digital objects more discoverable and easier to maintain.

So, now I knew that there was a problem with the metadata we were embedding in the files. Something about a 1 and a 7? Deep inside the Photoshop user forums, I found that I was not the first one to run across this problem. These numbers refer to the type field in the XMP, with 1 meaning “byte” and 7 meaning “unknown”. So these files said "unknown" when they should have said “byte”, right? Well, not really. According to David Franzen (Employee)’s response in the user forum, both the 1 and the 7 were valid values. So why was I getting this error message?

JHOVE and FITS:

As mentioned above, FITS packages together a number of tools. The tool that was giving this error message was Jhove, or JSTOR/Harvard Object Validation Environment. According to wikipedia, Jhove tells us whether or not objects are “well-formed (consistent with the basic requirements of the format) and valid (generally signifying internal consistency).” The version of Jhove that is packaged in FITS says that in order for a TIFF to be well formed, tag 700 needs to have a “1”, and anything else is invalid. But it also seems that the "7" is also a valid value for this tag. So, why is there this discrepancy in what makes a valid TIFF? Well, it turns out that when Jhove was first developed, the TIFF format specifications weren’t exactly easy to decipher. The TIFF specifications encoded in the tool were based on confusing, incomplete and scattered documentation. When others started getting the same error message as I got, they turned to Adobe for clarification. As a result, Jhove’s code was updated in version 1.8 to accept both “byte” and “unknown” as valid values in the 700 tag.

However, the updated version of Jhove didn’t make its way into FITS. Apparently, there were some other changes to Jhove 1.8 that would make integrating the newer version into FITS a rather large job. Making the necessary changes to FITS to accept newer versions of Jhove currently isn’t a priority for the FITS developers.

The Real Culprit:

Now that I knew what was causing the error message, I circled back to the big question- why now? The first 45,000 files had been just fine. What changed? In discussion with our digital production team, I learned that there had been a significant change to the production workflow, specifically in how they were adding embedded metadata. What before had been a time consuming process was greatly simplified by using Adobe Bridge to quality check images and add metadata. In researching this error message, I had seen people mention Bridge as the culprit in changing the 700 tag.


                          

Testing embedded metadata settings:

To be sure, I decided to play around with the settings in both our capture software and Bridge to see if I could get a different result. I created a number of test images with different metadata settings using our capture software, then ran these through FITS. All checked out okay. Next, I played around with the metadata setting in Bridge, and made changes to the embedded metadata in my test files. I ran the files through FITS again, and all failed to validate. No matter what settings I used in Bridge, the 700 tag was changed.

So Now What?

Now that we knew what was causing the error, there were a number of different approaches we could take. To find out what we did, stay tuned for my next blog post...



Written by Jenny Mullins