Portrait of the artisan as a young geek
So now Leigh is granting me another link and reposting my snarky comments to xml-dev (I should watch my mouth, as my mother would no doubt tell me). This is fine; I accept responsibility for my public snarking. (And no one seems to have responded to it anyway, so perhaps I’ll escape scot-free this time.)
I should probably explain just who the hell I am to be snarking, though. If nothing else, it provides a counter-datapoint to the endless stream of database gurus cited as XML’s main (sole?) audience.
It was all an accident. I burned out of graduate school in late 1998, and after a few months of pink-collar temping landed a job with Impressions Book and Journal Services as an Electronic Publishing Assistant.
Impressions thought I might have some luck learning SGML because of work I had done in graduate school keying medieval and early Renaissance manuscripts and books into a weird, homegrown, non-SGML-based markup system. As it turned out, they were more or less right. Within six to eight months, I could handle data analysis, write a DTD, edit and document already-written DTDs, or turn garbage into SGML via regular expressions and the beginnings of Desperate Python Hacking.
I shouldn’t (and I don’t) claim that all this prowess stemmed from personal brilliance. C’mon, we’re talking about a grad school dropout here, okay? I had excellent teachers. A good deal of their excellence as teachers, in fact, came from their having been in my shoes. None of them were computer scientists. None of them were trained programmers. There was plenty they didn’t know; there was plenty that we collectively didn’t know.
But they understood books and book production. All of them were typesetters; now that I come to think of it, I was the first non-typesetter in that department. (I understood texts, being an ex-lit-critter and ex-historical-linguist, but I had a lot to learn about books.) They did a pretty amazing job integrating what they knew about books with what they learned about markup.
As a publishing-services company, Impressions noticed the ebook hype pretty quickly. I don’t think I can take credit for that; I honestly don’t even remember who in the EP department first noticed the Open eBook Authoring Group and its new “Open eBook Publication Structure.” I can take credit for sussing things out quickly, though. (And here I go, about to get myself in legal trouble. Oh, well. Maintaining the fiction gets old anyway.) I wrote what is now the OEBPS FAQ and published it via ebooknet.com (no link; no longer extant) in late 1999.
When the Open eBook Forum held its first annual meeting in May 2000, I represented Impressions. I immediately joined the Publication Structure Working Group and in short order became its scribe, a position I still hold. I am and I’m not a typical PSWG member. Typical PSWG members, like me, are made, not born, markup artists; only our long-time chair Allen Renear (of the Text Encoding Initiative, among other things) has been around markup for a long time. Typical PSWG members, unlike me, are software and hardware designers, not content creators. (I was the only PSWG member who could make any claim at all to understanding publishers’ issues for a very long time. Made me nervous.)
I left Impressions rather badly, in May 2001, when a management shift sharply curtailed my on-the-job autonomy and I didn’t have the political savvy to resist effectively. I had planned to start my own tiny ebook conversion/consulting business, but landed at OverDrive instead. About that, I have said enough and more than enough in recent days.
All that said, what can I actually do? Well, aside from the skills already mentioned—not much, to tell the truth. I wrote an ISO-entity-to-Unicode translator in Python that I still use (props to John Cowan for the actual equivalence listings). I can fiddle a bit with XSLT, if I have a reference open on my lap. (I lost a lot of interest in XSLT when I discovered—the hard way, of course—that it did not solve one of the commonest problems I face, that of adding hierarchy to flat data. Data that comes out of typesetting systems is flat, flat, flat.) I can sort of read, but not at all write, an XML Schema. I understand more about XML namespaces than anyone should. I can follow some of RDF. I grok almost all of CSS2, enough to grumble at current web-browser implementations. I’m teaching myself basic SAX scripting via a project I’ll blog about some other time. (SAX makes more sense than DOM in this context; again, working with a very flat and simple structure, but potentially lots of data.)
I am about to start a new job completely unrelated to XML or ebooks. Whether I expend the additional effort to continue with the OEBF remains to be seen. I do not, however, expect to be gone from markup forever. Perhaps I’ll come back a database wonk, who knows?
What is to be learned from this topsy-turvy tale, if you, like Leigh Dodds, are interested in the underground XML world? Here’s what I think:
- Markup isn’t rocket science. (This statement has reached the status of aphorism—all right, in-joke—in the OEBF, because I’ve said it too often.) You do not have to be a Ph.D in computer science to handle markup effectively.
- Going it alone is bloody hard. Probably impossible. Even if markup isn’t rocket science, it contains plenty of gotchas for the uninitiated. Maybe there are geniuses somewhere who can learn XML out of books. I am sure as heck not one of them. Without tutelage, without having started out on ongoing projects (SGML journals, mostly) that more experienced people had worked the kinks out of, without coworkers able to grin and say, “Yeah, it’s stupid, but it just works like that,” I would never have made it as far as I have.
- Too many vendors are selling learned helplessness. I don’t think I need to say more on that unpleasant subject.
- Some markup standards are broken. I don’t know that everyone would agree on which ones and how, but I see the evidence often. (My own list of candidates: XML namespaces, XML Schema vis-a-vis character-entity handling and human-readability, CSS vis-a-vis conversion from print production tools, XSLT vis-a-vis adding hierarchy. XSL-FO is unfinished, but my impression is that it’s not broken.)
- Markup standards are pitifully unusable in the vast majority of today’s book production contexts. There are lots of reasons for this, by no means all of them due to markup standards themselves. (In other words, improving the standards won’t necessarily solve the problems.)
I am known (insofar as I’m known at all) for pontificating on that last subject; I’ll try to get my most famous (ha!) pontification up in half-readable form soon.
If I can un-hose my laptop, that is.