Archive for April, 2002

25 Aprili 2002

Still pondering…

Bloguiente? No, the verb is probably “bloguear,” so the agent noun should be “blogueante.”

But I don’t like that either.

Bloguista? Not too bad, not too bad. I could live with being a bloguista.

For what it’s worth, my husband says that his morphology class had a lovely time pondering the word “blog,” which is a thoroughly implausible clipping in English.

The one true text?

AKMA kindly blogged back about the kinds of tools and widgets that he would find useful for Biblical criticism. It’s a good list; I won’t summarize it for fear of depriving anyone of the pleasure of reading it.

Characteristically, AKMA doesn’t quite take me on over markup issues, because I come across as a bit of a zealot and zealots are no fun to argue with. He does, however, raise an indirect question:

One reason I’m at-a-distance on the whole structured- information phenomenon is that the designer in me mistrusts any system for defining information types, for parsing communication into categorical bundles… I am not yet so trusting a soul that I’m ready to entrust Tristram Shandy to encoding according to someone’s SGML schema.

He may be surprised to find out that I’m actually on his side. I’m no designer, but I do have a comparative literature degree (for all the good anyone gets out of it), and I’m just pomo enough to be decidedly aware of one-true-interpretation issues.

A friend of mine from ebook circles, a real markup zealot (probably, says the nastier part of my mind, because the lion’s share of his time is not spent practicing markup), genuinely seems to believe that there is One True Way to mark up any given text. The text Really Is structured in just one way. Each typographical feature Really Has just one interpretation.

AKMA would never say “Bollocks,” because the word is rude, crude, and uncompromising—but I’ll say it. (If my friend is reading this, he’ll understand. He knows how I get.)

This sense of text was born (in one incarnation, anyway; doubtless there have been others) in 1990, in an article by Renear and DeRose called “What Is Text, Anyway?” (Renear and DeRose now know better, but most mainstream markup geeks have yet to catch up with them. Not that I fault mainstream markup geeks; I can’t keep up with Allen and Steve either.) Falsifying it is dead easy. Consider the “fundamental” way to divide up text agreed to by even the stupidest word-processing and page-layout programs: blocks and character ranges.

Typographically distinct character ranges are often semantically ambiguous. “Hey, is that the Andrea Doria over there?” yells someone in a novel. (Thanks to Jim Drew for this hypothetical example.) Well, is he yelling in italics for emphasis, because he’s yelling a foreign term, or because he’s yelling a ship name? All of the above, very possibly. So how does one tag it, in a world where zealots curse the existence of <i> tags?

Blocks tend to overlap. Renear’s favorite example is verse plays, which can be blocked out in very different fashions depending on whether one is a drama director (to whom the obvious chunks are characters’ lines) or a versifier (to whom the obvious chunks are lines of poetry and entire poems, regardless of who speaks them).

Prose, however, is hardly immune to block-overlap. My favorite example (because it causes so many typesetting problems) is the paragraph split by a block quotation. (I wanted to come up with a snappy example from Lazarillo de Tormes to counter AKMA’s Tristram Shandy reference, but nothing is springing to mind. El blogador un punto ha de saber más que el diablo, or something.)

(Eek, “blogador”? I don’t think I like that coinage. Perhaps a native-Spanish-speaking blogger could set me straight on this point of terminology?)

So there is no One True Text. As with everything else if you’re sufficiently pomo, you have to take point of view into account.

Unfortunately, the matryoshka-doll nature of SGML and XML markup makes recording multiple points of view of a single text awkward at best. The TEI people have come up with a lot of brave and useful kludges, but they know perfectly well they’re kludging.

I find it difficult to imagine how to fix this problem within the strictures of SGML and XML. What would seem to be needed is a way to overlay multiple markup schemes, which may or may not relate to each other in one or many ways, over the same text. Should the text not be immutable, the problem gains several orders of magnitude of difficulty (assuming the overlays are done with some sort of pointer mechanism, how do you keep the pointers from breaking if the text changes?).

Architectural forms won’t do it; they, too, assume one set of ur-markup per text. Namespaces won’t do it; they don’t do overlapping markup any better than any other kind of XML does. One might lash something up on the basis of XPointer, as long as the unexpurgated XPointer makes it out of committee. The various “FixPointer” proposals aren’t half flexible enough for this kind of thing.

Opens up whole new vistas for the scholarly edition, though. (Not to mention the textbook edition, the multiple-edition edition, the translated-with-original-text edition…) No more will forty billion pointless footnotes be sufficient. Editors will have to get down-and-dirty with the text in a way that thoroughly delights me. Because for all these perspectives to be manipulable in an electronic context, the markup has to be there, and the editor has to do some work to tell the ebook machinery what to do with the markup once it’s there.

Sigh. I feel the tug of a calling, I do, and it bugs me that I can’t answer.

Links that popped up while I was blogging this:

24 Aprili 2002

More Samsung

Is the plot thickening?

A day after Samsung announces a new screen-thingie, they announce a licencing agreement with Picsel (thanks, KnowBetter) for e-reading technology.

The article doesn’t make this connection, bloviating about cell phones instead, but I can’t help wondering.

Cross-disciplinary data capture

It’s no big secret that what I do at work these days is type microfilm data from the 1910 census of Puerto Rico.

That doesn’t manage to fill my hyperactive brain. So I’ve been roughing out an idea of some of the demographics, and naturally I’ve been noting language use.

It’s a terrible shame that most of the linguistically-interesting stuff from these microfilms is being abstracted out of the database. Place-name spelling variations. Graphic accents. Notes from supervisors on correct spelling. Interesting stuff to a historical (or even synchronic) phonologist, but the data are useless as they’re being captured.

Now, the demographers and the historians are running this show, and they have every right to. They wrote the grant proposals. They did the initial gruntwork. They wrote the transcription manual. They’re watching over our shoulders.

I still wish, though, that the data would end up useful to more scholars. The additional work isn’t tremendous.

Look what I’ve found already, just on casual inspection of the reels:

  • Frequent alternation between singular and plural forms of place names. I have never been much for New World dialectology (dropped out of grad school in the middle of the class I was taking on it), but my hazy memory remembers Cuba as final-s droppers, not Puerto Rico.
  • Plenty of yeísmo, as one would expect.
  • A truly astounding inventory of person names. Who’s teaching Greek mythology in Puerto Rico in 1910?
  • A rather curious insistence on final -z in words like “Portuguez”. Possibly an offshoot of final -s dropping? (“If you pronounce it, spell it with a -z.”) Or just copied from the ubiquitous person-names in -ez?

Someone with some expertise in the area would doubtless pick out a lot more than I can. Shame the published data won’t allow in-depth analysis.

23 Aprili 2002

Latin geekery

You will have noted the change in date format. Had to do it myself; Movable Type understandably doesn’t go for dead languages.

I don’t know any Perl, so even so simple a change as this took me a couple tries to get right. My husband said, “To be really correct, you know, you need to do it in ides and kalends.” That, however, would require ’way more Perl than this humble Pythonista ever wants to learn.

(If you want to play with this yourself: the language-specific wordlists are in lib/Util.pm, and the options are in tmpl/cms/cfg-prefs.tmpl).

Scaredy-Goths

The Goth-kitties survived their tooth cleaning at the vet today, and I will somehow manage to survive the bill.

They came home understandably perturbed. Dream “I am not a lap-cat” Salo crawled into my lap as I put the finishing touches on a Python script and would not be budged for quite some time.

I think we will be forgiven in a day or two, however. Pity. I’m rather fond of lap-cats.

Python cures what ails you

Just ask Cleopatra.

My computer went on the fritz today at work (have I mentioned lately that Windows bites?), so my boss cast about for something I could do from someone else’s machine. Seems they want a catalog of the contents of particular database fields to feed to the guy writing the database-checking program.

Here’s how they were planning to do it. The database is a custom Access app. Its built-in export mechanism exports pipe-delimited (go figure) plain-text files. These files were to be imported into Excel stylesheets so that the fields they weren’t interested in could be stripped out. Then they were to be saved back out as text, so that the database guy could import them back into Access to perform a couple of queries and get a pretty-printed (I assume) result. I was asked to do the database export, Excel import, column cleanup, and Excel-to-text export.

Ye gods and little fishes. All this, to catalog some stuff out of a clearly-delimited text file?

So I said with my best smile, “Tell you what. It’ll take me most of the rest of the day to export these databases.” (Which did not quite turn out to be true, but close enough; since there is no master database, all the data to date are spread across a huge mess of files. This was 3:30 pm or thereabouts.) “I’ll email myself the exported files at home and see what I can do about getting you the data you want, ’k?”

I just emailed them the result. Took me rather less than an hour (including the remaining time at work) to write the Python script (and only that long because I had to deal with both the Access dumps and some “cleaned up” Excel dumps). I hate to think how much time they wasted in unnecessary tussling with Excel.

I figure I earned my keep today, even if my work computer did keep me from my assigned tasks.

More gadgetry

Someday I will scoop Jenny, but I’m not counting on it being any time soon.

Today she found a new screen by Samsung whose chief whiz-bang appears to be that it folds in half. Ebooks are the projected use.

Jenny worries about all the goofy-looking knobs around the screen frame. My own guess is that these are fripperies to make a mock-up pretty, since all Samsung has done is design the screen; the actual gadget to be built around the screen probably hasn’t been designed yet. (I do hope that nauseating green color is an artifact of photography, however.)

I would be curious to see who owns hiebook at this point… is Samsung wanting to scoop them? I’ve heard that ebooks made a rather bigger splash in East Asia than here. (Makes sense. Typesetting ideographic languages is a royal pain.)

22 Aprili 2002

What does it take?

Sharp and somebody I’ve never heard of called DoCoMo want to come up with yet another e-content format. But this one will have pretty moving pictures and stuff, they promise.

Guys? Been done already. Didn’t get noplace, because it was unique, not based on anything anybody already knew how to do.

Yeesh. Some people never bloody well learn.

Why ebooks suck…
even though they don’t have to

Jenny posts more dedicated-device doom-and-gloom today:

I would argue that the time has come for audio ebooks and even ebooks on PDAs, but dedicated devices should have been roundly trounced and were. Just like users are not flocking to Pressplay and other online services that don’t offer what they want, Gemstar, its predecessor, and its rivals just don’t understand the power of standards and ease-of- delivery. Consumers aren’t going to invest their time and money in a device that can only read one specific type of content with few choices available in that format.

When Gemstar decided to stop letting their customers (the few folks that did actually plunk down money for their devices) download web pages and documents onto their devices, I knew the game was over. It was a clear illustration of their lack of vision and mis-reading of the market, and it failed miserably.

The ebook content industry is choking itself because of a lack of standards and available titles, which is exactly what we’re seeing with digital music online from the record labels. Publishers should wake up to the fact that they are killing off their most potentially lucrative digital markets for fear of success.

Note the two different players Jenny mentions: Gemstar and publishers. I happen to know that it wasn’t Gemstar running the show here. It was the publishers. (By the way, if you get the impression from this blog that I am not terribly impressed with the savvy and vision of the typical large publisher—you are absolutely right.)

Gemstar cut the knees out from under web-page downloaders because publishers told them to. Security/DRM issues. Competition. That kind of thing. And the folks who would have stood up to the publishers, who had the vision, mostly left when Gemstar took over NuvoMedia and SoftBook.

Of course it was bloody suicidal for Gemstar. Some of them (the few holdovers, in my personal experience) even knew it. But the fundamental mistaken assumption here is that Gemstar’s customers were book buyers and book readers. Nope. Until the fabled “critical mass” of content appeared, Gemstar believed it had to woo publishers, not readers. Readers (thought publishers and Gemstar alike) would follow content like sheep. Who controls content? Not readers. Publishers. Publishers.

Nor was Gemstar the only thinker along these lines. Versaware and NetLibrary travelled the same road to the same destination. I think it’s useful to understand why, so that responsibility can be laid at the right door.

Is it coincidence that these three casualties were silent about their connections with the OEBF and OEBPS, whereas still-twitching Microsoft made its connection evident throughout its authoring guidelines and market appeals? One wonders, yes, one wonders… Ironic, that Microsoft of all entities should realize benefits from standards awareness. (Not that Microsoft is itself all that pleased about the progress of ebooks, but that’s another story.)

Because I think there’s an important strategic point to be made, I do want to question one part of Jenny’s otherwise well-argued diatribe:

Consumers aren’t going to invest their time and money in a device that can only read one specific type of content with few choices available in that format.

Which consumers, Jenny?

Back at the first OEBF annual meeting, it was astoundingly clear that the folks dancing in the aisles about ebooks for reasons other than dollar signs were the visually-impaired. Makes sense. Small choice combined with device-specificity is one hell of a lot better than the zero choice they have now. And standards-awareness promised (and still promises) quite a bit more than small choice.

In other words, build it, and you lock in a market. A small one, I grant you. Nevertheless, isn’t a small but reliable market better to start out with than a large but wary and fickle one? Bird in the hand, and all that.

That, I think, is the key to the failure of dedicated devices. The dedicated-device boys saw the potential for big dollar signs from the mass market, despite that market’s near-saturation with p-books, so they ignored the unique smaller markets crying out to be served.

Nor are the visually-impaired the only folks to be stupidly ignored. How long has academia been screaming and yelling about the ridiculous waste and inefficiency of the paper publishing process? Am I the only person to see a tremendous opportunity there? And I certainly don’t think it’s coincidence that niche markets (sci-fi, romance, mystery) are the ones making money for the indie e-pubbers; big publishers have trouble making money on these genres (best-sellers aside), so don’t treat them well.

I hate to bring up Clayton Christensen and his disruptive-technology idea, because it’s been done to death, but I can’t help thinking that ebooks should have been treated as a disruptive technology—go after the small underserved markets, not the big immediately-lucrative ones—and weren’t.

On a geekier level (because Mung knoweth I have no stature to be talking marketing strategy), I note that the best work done on the Open eBook Publication Structure has come from serving niche markets. There’s some unbelievably incredibly awesome navigation stuff coming in the next-release-but-one of the OEBPS that comes right straight out of the accessibility community. The pioneering work being done now on multiple-XML-namespace docs and publication packaging come from a desire to make the science and tech folks happy with MathML.

I think that Jenny’s completely right that for the fiction-and-trade reader, dedicated devices are a pointless luxury. PDAs and gadgets like the OQO are where it’s at. (And if I implied otherwise in the email I sent to Jenny a while back that I suspect was overblunt, I apologize.)

Tell you what, though. I want my LinguistBook. Concordances on demand. Interlinear text. On-the-fly flipping between different editions of the same text. Expansion and contraction of scribal abbreviations at will. An etymological dictionary for lookups, fully cognizant of common spelling variations and abbreviations. Statistical toys for hard-core lexicographers. Put that in a dedicated device, and I am ever so there.

Hey, AKMA, what do you want in BiblicalCriticismBook? (Which is probably a specialized version of the more general LitCritBook, which itself probably overlaps LinguistBook considerably.) Don’t be shy. We specialized markets are the future of dedicated devices. We have to be.

Or, heck, why not RPGBook? That’s much simpler. Incorporate easy-access (queryable, preferably) tables, a dice roller (which is only a jumped-up random-number generator; heck, even I can program that!), a few character generators, an NPC name book, and a few basic forms (e.g. for adding the nifty new spell from the latest issue of Dragon Magazine), and there you are. Gamers would dig it. I guarantee. Lugging those books around gets to be a pain, especially in a LARP.

Finally, a secondhand quote regarding Audible:

Audible is also a good model for how content publishing on the Internet ought to work–the music industry should pay attention to what Audible is doing. I like Audible’s take on digital rights management: The books I download are mine to listen to and keep forever. I can download them again if I need to. I tried that with some e-books I bought at Amazon without success, which feels like flushing money down the drain.

Audible is missing one important thing. Not a standard; MP3 is that, unfortunately. An open standard. MP3 is not that. Neither is MPEG4, with which I believe Apple has already gone a few rounds.

These standards are proprietary, just as PDF is. They are at the mercy of the marketdroids and green eyeshades. Their owners can pull a Unisys at any time. Become too dependent on them. I dare you. That’s just what the marketdroids are waiting for.

Allow me to point in the direction of Ogg Vorbis, and suggest mildly that Audible go there. Open standards for content make a difference.