I had a grouchy weekend, filled with Googlebot blasting my DSpace installation into smithereens not once but twice (how much does Dorothea hate Java, boys and girls? A LOT, that’s how much) and a TAG markup project that led to the growl following.
People who understand books and book production understand that individual aspects of typography are overloaded. Overloaded in the programming sense, I mean—depending on context, a given typographical embellishment may have a different meaning. Overloaded, polysemous, ambiguous—whatever word floats your boat.
Take the humble italic font. It demonstrates emphasis. It sets off the titles of books and other extended-length works of art. It sets foreign terms apart from surrounding text. It sets biological genus-species names apart from surrounding text. It delineates ship names (but not, curiously, aircraft names).
It can also be used just because somebody thought italics was a good idea at the time. Colonial-era American typesetters were absolutely notorious for this. If you can extract rhyme or reason from their type choices, you’re a braver woman than I.
Italics, in other words, are a cue. They don’t unambiguously tell the reader the reason for their existence; the reader picks from a mental list of what she’s known italics to signify in past reading, and happily goes on from there.
The neat thing about markup is that it permits various uses of italics to be disambiguated behind the scenes, if desirable. If I’m writing a biology textbook, it’s probably not a bad idea to disambiguate genus-species names from other uses of italics—that makes it possible to create a handy-dandy index of organisms named in the book.
Understand, though, that this disambiguation doesn’t just happen. Somebody’s got to actually do it. Trust me, that somebody is not going to be anybody in standard book production. Italics is italics, end of story. (You might get a clueful copyeditor. I wouldn’t count on it, though—and the clueful copyeditor’s work is wiped off the slate when the book hits print anyway.)
This brings us to HTML, where back in the day, <i> was <i>, and that’s all she wrote. But this is bad! cried the semantic generation of HTML designers. <i> doesn’t mean anything! We have to have tags that mean things!
Which is a complete misunderstanding of the problem. The problem is not that <i> is meaningless. The problem is that it means too many things. The proper solution to this problem, given HTML’s problem domain, would have been to add tags for the commoner uses of italics on the Web and perhaps to insist that <i> be embellished with a class attribute for less-common uses that HTML cannot be expected to anticipate. (I don’t think many practicing biologists sit on W3C working groups, so a separate tag for genus-species names was probably never in the cards.)
What happened instead? <i> was deprecated—people were told not to use it!—in favor of <em>, which means “emphasis.” So let’s step back. Web folks used to tag things ambiguously. This is sometimes necessary (perhaps I don’t know why something is italic!), sometimes not great, but can always be lived with; we’ve lived with it in print for centuries. Now, with the blanket replacement of <i> by <em>, Web folks are demonstrably tagging many things incorrectly, because not every use of italics is for emphasis! This is an improvement? I think not.
I spent much of the weekend wincing at (and either fixing or actually performing) tag abuse of <em>, <strong>, and <q>. And checking my work email every hour or so to make sure DSpace hadn’t run out of memory again. No wonder I’m grouchy.