29 Octobris 2002

Analogies

Clearly what we need here are some usable analogies. Let me take a stab.

We’ll try music. Or better yet, movies. A movie studio putting together a movie records the images on super-nifty-neato expensive film stock. Sound? Recorded separately—and I do mean separately, as there’s music and Foley stuff and other audio effects to add in. (I’m being vague here because I don’t know what I’m talking about.)

I don’t know what precisely movie studios archive, but I would guess one or more finished prints in the best medium available, the one that preserves most fully the information captured on film and in the recording studio. If they’re smart, particularly if they expect sequels or a director’s cut, I would think they’d keep a lot of the components around, too, in the appropriate high-fidelity medium for each component (which will be different, obviously, for a recorded sound and a computerized visual effect).

Does the DVD or videocassette that you, the consumer, buy contain all the information available to the studio? Of course not! Nor do you even expect that. For example, you probably can’t tease apart the individual tracks in the sound. You can’t get at a frame without the visual effects added in. That’s okay; all you’re interested in is watching the goldarn movie.

When a new end-user format like, say, DVD comes out, does the movie studio use VHS videocassettes to make DVDs? We’d pillory them if they did; VHS fidelity is lousy. The studios go back to their master copies, of course; they figure out how to do the transfer to the new format and they do it.

Likewise, with print books you don’t get a copy of the typesetting files or the PostScript. You don’t care (do you?). And if a lucky book goes into a second printing with a spandy-new cover, the printer doesn’t photocopy the pages and bind the photocopies, which is the best analogue I can think of for turning one end-user format into another. What a horror that would produce. The printer goes back to an earlier-in-the-process, completely non-consumer medium. Could be plates, could be PostScript or PDF, could even be typesetting files. Some sort of master, in other words.

What does this have to do with ebooks in general and OEB in particular? Well, look, the OEBPS lays out a format for ebook master files, ebook files as experienced by ebook producers, not ebook files as experienced by human readers. This was done with the understanding that ebooks as experienced by human readers are going to change. Rapidly. Radically. Should change, in fact, because as everyone and his dog delights in saying with appropriate nose-lifting disdain, the first generation of readers was something less than universally desirable or universally capable.

That does not mean, however, that the masters must change in equally radical fashion. Movie studios didn’t rush out and re-archive all their movie masters when DVD became big with movie-watchers, did they now? (They may indeed be using some digital medium for mastering now that they didn’t before—I wouldn’t know—but I guarantee you that if they are, the decision to do so took place entirely independently of the consumer move to DVD.) The logical way to keep ebooks available and usable in periods of such change is to lay down a protocol for master files. Not for human-reader-destined files. Master files.

Ideally, those master files are so good that new human-reader formats can be generated from them indefinitely, with as little pain as possible. Is OEB there yet? No, it’s not, particularly for certain categories of books, but it isn’t too far away, it’s moving closer, and it’s a whale of a lot closer even now than anything else going.

Expecting to generate new human-reader files from old human-reader files is like expecting to generate DVDs from videocassettes. The loss of fidelity from the transformation is only a minor problem compared to the lack of fidelity inherent in an end-of-the-line format. This is one reason asking Microsoft (or whoever) to make .lit convertible is a dumb idea. (Another reason, of course, is that Microsoft is going to laugh at you.)

“The ebook industry would take off if only there was a single end-user format.” Heard that a lot. Maybe it’s even true. Know what? I don’t care, because an industry based on a single end-user format is cruisin’ for a bruisin’. Do you honestly believe you can create an end-user format that will survive technology changes? I’d like to see you try. If you try and fail, what then? How are you planning to tell all those publishers that your format is a dud and they need to change everything they do to accommodate your new format, which really, truly, genuinely isn’t a dud?

Please.

Decoupling master files from human-reader files, and standardizing the masters only, just makes sense. It allows the master files to be consistent and the reading experience to change. It is the only feasible production system that allows that.

The reason folks have such a hard time accepting this, I think, is that the system of master copies is completely hidden from them in most other media industries. I watch movies, yet I haven’t a clue how they’re archived. I didn’t know diddly about how paper books were made, much less how masters were archived, until I started working for a publishing-services company, and I’ve been reading paper books since I was three. Since ebooks are new, however, more of the nuts-and-bolts of the process is exposed. We’re thrashing out production problems and questions in sight of everybody. Which isn’t a bad thing at all; it just occasionally necessitates liberal applications of clue-by-fours.

The other reason this seems so hard is that most publishers aren’t accustomed to thinking about archival issues as relating to anything but paper. Free clue: the world is changing under you. Wake up. Fast.

And if I may put on my activist hat for a moment: The reason we have all these arguments over copy-protection for DVDs is that there is only one end-user format. That format is a chokehold on all of us. Do you honestly want to repeat that mistake with ebooks? I didn’t think so.

Yes, even creating a so-called “open” end-user format would be a mistake, because the identical instant it becomes obsolete, all the reading-system manufacturers are going to squall about how it was a total failure and proprietary formats are just plain better. And we’ll land in a useless mess even worse than the one we’re in now with PDF versus .lit versus everything else, because the OEBPS will go down along with the end-user format, and we won’t have any master files anywhere.

Postscript: Thanks to Stuart for correcting a boneheaded math error in my rant of earlier today. Note to self: Folks don’t generally write routines to convert a format to itself.