23 Augusti 2005

Michigan digitization, again

I could make this a very short post and just say that I was right, but that’s no fun.

Via digitizationblog, a PDF FAQ from Michigan (really, people, couldn’t that have been HTML?) that answers quite a few of the questions I’ve had about the project since I first heard of it.

For example, file formats. I quote:

  • Most pages (i.e., those that consist of print without illustrations) are delivered to Michigan as 600dpi TIFF images using ITU G4 compression.
  • Occasionally, pages include significant illustrations; these are provided to Michigan as 300dpi JPEG2000 images.
  • OCR (performed by Google) is provided with each page.

Call that a reading-ready digital object? I don’t. Not that it’s not useful; linguists in particular should be drooling right now. What a corpus! But no, it’s not going to magically turn into a stack of ebooks. Come on. It never was.

Question 28 (”Why does UM want its own digital copy?”) is a red herring; I doubt anyone ever asked it. Still, it’s important that UM get its licks in on that subject, because preservation is an issue that the copyright hawks should be assailed on.

All in all? I’m still waiting and seeing. Important as this project is, it’s not what either its cheerleaders or its detractors think it is.