Warning: fopen(/home/.lasher/yarinare/cavlec.yarinareth.net/wp-content/cache/) [function.fopen]: failed to open stream: Is a directory in /home/.lasher/yarinare/cavlec.yarinareth.net/wp-content/plugins/wp-cache/wp-cache-phase2.php on line 96
Caveat Lector » Ebooks

Dies Mercurii, 1 Decembri 2004

Partnering for production

Because I am a bad person, I spent yesterday’s cataloguing class listening with rather less than half a brain to my colleagues’ project presentations. (Except for the one about dinking around with MARC records in Perl. How come Python doesn’t have anything as keen as Perl’s MARC module? Because, wow. What even I could get done… I know about MARC21.py, but it doesn’t seem to be nearly as mature. Days I wish I didn’t hate and fear Perl.)

The rest of my brain occupied itself on some recent periodicals, specifically the latest issue of Searcher. Said issue contains a brief, circumspect, toe-in-the-water look at academic libraries becoming publishers, especially serials publishers.

Some especially good quotes near the end of the article pointed up the possibilities of partnerships with university presses. I hadn’t thought about that before, but it makes sense… and just now, something jogged loose in my head that makes it seem even more sensible.

That something being: Publishers, including university presses, do not know how to do electronic-text production. Libraries do not know how to do print-text production. A few exceptions in both directions, of course, but by and large, this has been and still is the case—meaning that libraries and university presses are in a fantastic position to shore up each others’ shortcomings, because publishers obviously do know how to do print, and libraries do know about doing electronic (they just call it “digitization,” is all).

The folks I’m keying Greek for (who as far as I know won’t mind me mentioning them here) appear to be just such a collaboration, though for books rather than serials. (Figures that this project comes out of Michigan. Those people are so far ahead of the curve that they’re close to running out of curve altogether.) When I got started on the work, I let slip for all the obvious reasons that I was a soon-to-graduate student librarian. My correspondent also has a library degree, and asked if I was interested in jobs at university presses.

I didn’t think I was, at the time, but now I’m starting to wonder. Somebody’s got to do the production-liaison work if these partnerships are going to fly. Who’s hiring that somebody, then? Is it the libraries or the presses?

Dies Jovis, 7 Octobri 2004

Don’t do it!

I have three words for publishers thinking Google’s new Google Print gizmo the answer to all their digitization needs. They are “don’t,” followed by “do,” “it,” and a whole line of exclamation points.

What’s my issue? Ownership of the scans rests with Google. So the publisher gains only searchability, not real digitization. And the publisher may lose significant control over data. (Do you know how Google is going to digitize? Do you know they’re going to do it right? If they screw it up, can you get them to fix it, or heaven forbid, do it yourself?)

To its enduring shame, the ebook bubble offered publishers precisely this same scam. Give us your content, we’ll do it up pretty for free (except we own the result and you’ll never see a single byte). Once Versaware and NetLibrary hit the skids, publishers saw that they’d been had. Do not sign over your data! Not for the price of digitization! Trust me, it’s not worth it!

I suspect, also, that Google doesn’t know what it’s getting into here. The plans I’ve read indicate that Google isn’t scan-and-OCRing; they’re expecting to work from publisher electronic files or possibly PDF Normal.

Hollow laugh. Good luck, Google. I know what those files are going to look like. Do you?

It’s a case of invisible labor, if I may paraphrase Greg Downey. First the ebook techies and now Google, operating on zero knowledge of what book production is like and what book producers actually produce. Me and my fellow text artisans? Totally invisible.

I bet Google thinks the only reason Amazon doesn’t do Search Inside the Book with everybody is recalcitrant publishers. Ha. I’ll bet you everything I own in this world that part of the problem is incompetent production practices leading to impossible-to-use electronic files.

Eh, well. We shall see. I fully intend to gloat very loudly if I’m right, though.

Addendum: Okay, my bad. On further inspection, Google is going to scan-and-OCR. But if they think that solves production problems…

Dies Lunae, 21 Iunii 2004

Reading onscreen

I think Walt Crawford does a bit of rhetorical violence in his summary of one recent conference article (Paul Mercieca, E-book Acceptance, ’ware PDF, or do what I did and read the Google-to-HTML version).

The article is about reading class materials onscreen versus in print, that old chestnut that will never go away in my lifetime. I rarely see the print snobs conceding that familiarity with the medium is part of the problem here, that people won’t read extended texts onscreen simply because they’re not used to it. That, however, will take care of itself in a few decades, so I’m not terribly worried about it.

Crawford trots out the old etext-causes-eyestrain argument, barely noting that it relates to PDFs only. What he doesn’t say, though the article clearly does, is that students evinced much less eyestrain and general annoyance when presented with a Microsoft Reader text—a text, in other words, designed for onscreen reading.

I know this seems an obvious conclusion. Design for the medium, improve readability. Ever seen incunabula? They’re wretched, from a readability perspective, because cut type just doesn’t have the same affordances as pen-and-ink, and the first typefaces were slavish imitations of manuscript hands. Once printing got away from needing to look just like manuscripts, readability improved fairly rapidly.

The first onscreen-versus-print usability test I ever read about, though, utterly ignored questions of appropriateness of design to medium, pitting a color print copy of a popular newsmagazine against a grotty black-and-white (not even grayscale, if I recall correctly!) scan-to-PDF! They crowed mightily on the basis of that stupidly skewed test that onscreen reading would never, ever catch on. I’m deeply suspicious of print-versus-onscreen deathmatches now. I frankly don’t believe the speed difference Crawford cites; I want to know how those numbers were arrived at.

I myself cheerfully concede that I read PDFs slowly onscreen. The typical PDF—Cites and Insights no exception—isn’t designed for that! A well-designed web page, however, reads as quickly (in my admittedly subjective estimation) as print. An MS Reader ebook—well, I admit I get dumped out of immersion because of design flaws (both in MS Reader and books tailored to it); I know much too much about .lit, there’s no two ways about that. I used to read decently-designed .lit books on the planes home from Cleveland, however, and they felt pretty much printlike to me.

Nor do I completely buy Jakob Nielsen’s line on this subject, as Nielsen’s own site demonstrates that he wouldn’t know a readable onscreen design if it bit him in tender spots.

(And no, if you’re wondering, Crawford won’t do an HTML version of C&I. I asked. Not only did I ask, I offered to do the conversion and design work for him, being an opportunistic sort of wench who could make good use of the wide exposure such a task would give me. I’m not angry about it—even if I were, I’d have no particular right to be—just disappointed. Though I admit the print-on-demand book idea he’s playing with is probably better for him.)

Anyway… at the end of that snippet, Crawford asks peevishly why on earth anyone should make reluctant undergraduates read onscreen. Oh, boy, questions begged! Here are a few of my answers:

  • The material is not available in print, or can’t be got at except electronically owing to travel requirements or rarity or fragility or whatever. Libraries and archives haven’t been undertaking digitization projects for their health, after all. There honestly is stuff online that can’t be got at any other reasonable way. If it’s good, relevant stuff—I’d make them read it, sure.
  • If I knew in advance that a student of mine was blind or heavily visually disabled, I would intentionally skew my syllabus toward non-PDF electronic materials for accessibility’s sake. It’s just the right thing to do. Of course I’d also be on the horn to DAISY to see what my options were for print-only materials. But if the question is “would I force my sighted students to read onscreen so that their blind colleague would have an easier time?” the answer is an unequivocal yes.
  • The material was designed for onscreen perusal such that printing it is lossy. Heavily hyperlinked texts lose data when printed. If I expect my students to tool about a bit and click some links, I have no particular compunction about telling them so. I adduce the Cornell Digital Imaging Tutorial as something I’d make students read onscreen.
  • The material is interactive. I’m going to get whacked on this one, I know it, because interactivity is one of the buzzwords that the hypertext folks use, and (to tell the truth) I’ve not much more use for them than Crawford. (Though I did enjoy Hamlet on the Holodeck despite the horrid title. Admittedly, though, I read it in a roleplaying context rather than a purely literary one.) The truth is, though, simple little things like the HTML-form-based quizzes in the Cornell tutorial I just linked are interactive, and they’re worthwhile.
  • I am making a point about information literacy, online and off-. How the hell are we supposed to teach our students that they can’t believe everything they read anywhere, especially but not entirely online, if we never tell them to read anything online?

Because I am one of those evil e-text proponents, I would assign onscreen reading just to get students familiar with it. I doubt, however, that Crawford would back me on this one, and he’s quite within his rights not to.

Dies Solis, 20 Iunii 2004

Pop-up books?

If there’s such a thing as a friendly adversary, Walt Crawford is mine. I’ve told him privately and now I’ll say in public that even when I don’t agree with him, he makes me think hard about why I don’t, and I find that extremely valuable.

(He publishes the only ’zine-serial-webloggy thing in PDF that I actually read regularly. He’s that good. It’s not that I’m an anti-PDF zealot, though as we all know, I am; it’s that reading PDFs onscreen makes me growl and gives me headaches if I do it for too long, and I can’t afford to buy or store all the extra paper to print them. So I don’t read PDF ’zines. Except Cites and Insights.)

The latest issue of Cites and Insights (’ware 275K PDF) contains a brief mention of me, surprisingly unconnected to a quite long and impressive slagging of ebooks (with, it must be said, some grudging admissions of their use in certain areas; Walt Crawford’s no blind dogmatist).

It’s probably easier to point everyone to CavLec’s ebooks category archives than to explain again how I stand betwixt-and-between the worlds of print and electronic text. Suffice to say that anyone with the gall to imply (as Crawford never has, be it said) that I don’t understand or respect the print book is cruising for a knuckle sandwich.

(One of my SLIS professors Who Will Remain Nameless actually marked a paper of mine down for employing the ordinary print-typographic convention of not indenting the paragraph immediately following a heading. I think it fair to say I understand print better than that professor does!)

Suffice to say that despite my experience with and love for print, I have cast my own lot with electronic text; I will gladly and eagerly spend my life making them, and making them better than they are now.

So I read Walt Crawford’s roundup with mixed feelings, and I expect I’ll be running off at the keyboard about it a considerable part of the upcoming week.

Starting off with the fish in barrels, then… I confess I don’t understand why Crawford recommends this little squib. I don’t quarrel with the point that e-texts are not print; I’ve been known to say myself that e-texters have a less-than-comprehensive notion of the fantastic complexity of print.

I do, however, have a bone to pick with the bizarre notion that we must cling desperately to print, practically in exclusion of e-text, because print has capabilities e-text doesn’t. Especially when the canonical example given is pop-up books.

Pop-up books?

I will admit to a certain jaundice; I didn’t like pop-up books as a kid, because there wasn’t enough text in ’em. Even so—the best this man can do to defend print is this edgiest of all conceivable edge cases? Extend that notion the tiniest bit, for heaven’s sake, and we are forced to turn up our noses at print books because they’re useless as gravestones! Those stonecarvers, they were really on to something…

Everything in its place, say I.

Dies Jovis, 17 Iunii 2004

On print hypertext

I started out not much liking this article about how print books are really hypertext. (Partly it’s the dropdown menus. Man, I hate those things. More on that in another post sometime; I think I finally know why I hate them.)

Take this, for instance:

A book as a technological artifact is highly interactive and non- linear. Grab one and you can flop it open in the middle, skip around, and thumb through its pages forwards or backwards.

Yeah, sure. It’s a matter of affordances, though. It’s dead easier to activate a See also: link than to flip through a book. It just is. E-texts have vastly better targeted skippability than print. So, like it or not, it’s fair to call print more linear than e-text.

I note in response to their point about the non-linearity of reference books that reference books have been the first genre to roar onto the e-scene and establish a major presence. Do I think easy skippability has a lot to do with that? I do indeed.

The article does get better, however:

Since printed text has not changed, users of on-line text face an annoying situation. To work with text on-line, they must either import the old conventions of print awkwardly into the new medium or they must struggle with new conventions of hypertext that are too often unpredictable and ineffective. Hypertext conventions simply do not intersect well with print conventions. The information that can lead a reader to a passage in print may not help get her to the same passage on-line, even if it exists in that form. Likewise, encountering a text on-line often leaves the reader with scant clues about how to find it in print. We need to change this situation.

Right on. I’m all about the changing of this situation, personally. Hop to that paragraph and read the rest of the paper, and you come across one of my most heartfelt howls: the total randomness of print page numbers, and the need for an electronic-citation system.

I think, in fact, I shall adopt their phrase for this: “device-independent referencing.” I don’t usually think about things this way, because I deal with texts on the production end, long before they hit reader gadgetry, but the phrase gets the idea across to people who aren’t production geeks like me.

Anyway, article worth reading. Give it a look.

Dies Veneris, 21 Maii 2004

Lookie what I found

Heh. Just for fun, after writing this entry I rooted through my hard drive for old work-related stuff—and actually managed to find what I wrote about that godawful Quark/XML spec.

Yup. I still have it. Turns out they sent sample files, too, which I went over with a fine-toothed comb and wrote a—um, well—rather scathing reaction to.

I am ever-so-tempted to forward these documents to the publisher in question. I don’t see how it could possibly get me in trouble at this point. And, years later, I’d just feel better. I would. I hate seeing people get away with rot.

Dies Mercurii, 19 Maii 2004

Once upon a time

I promised a story about the dangers of trusting vendors to create specifications. This is that story.

There once was, and still is, a Gigantic New York Publisher (which for lawsuit avoidance will remain nameless). This publisher wanted to do OEB ebooks, but found the back-end conversion process that led to OEB rather daunting, given the variability in typesetting workflows and resulting file quality they had to deal with. Quite understandable.

So they threw millions of bucks at a Gigantic Consultancy Outfit (also nameless, for similar reasons, but I still have the business card of the project manager on this job) to write them a house Quark stylesheet (not a house design, to be clear, just a set of style-naming conventions) and a related OEB XML spec. It so happened they asked me whether a house stylesheet and XML spec was a good idea. I told them yes, it was, because it is. But there were things about the project I didn’t know…

Amount of typesetting experience this Gigantic Consultancy Outfit boasted? None. Amount of markup experience? None. Amount of OEB experience? Big zero. Why was this outfit hired, especially for such incredible amounts of money, when there are plenty of people around with either typesetting or markup experience—and a few with both—who charge orders of magnitude less for something of this nature? Heck if I know. I suspect it was a nobody-goes-wrong-buying-Microsoft thing.

How much in-house typesetting experience did the publisher have? None. (At least, none that I ever saw or heard about.) They’d outsourced all their typesetting long since. How much in-house markup experience? None, though they were making incredibly feeble and inadequate baby steps toward developing some. (To the best of my knowledge, some five years later they still haven’t succeeded.) OEB experience? Well, they were involved with the OEBPS working group, if by “involved” you mean “a person with no technical expertise and nothing to contribute silently attending conference calls and meetings.” That was it.

Trainwreck waiting to happen.

I did not run into the resulting spec until a year or so after it went into the field. If you dig far enough into CavLec archives, you can find a complaint or two about it, and one outright rant after a day of utter frustration.

Because the spec was rot. Unusable garbage. It did not and could not work as a typesetting spec; the publisher complained in my hearing that “we can’t get our typesetters to adhere to it!” without once (as far as I ever heard) considering that the problem might not rest with the typesetters. As an XML spec, it was barely functional in some areas, but broke down completely several times, mostly because nobody at the publisher or the consultancy outfit grokked the concept of markup hierarchy. As I wrote then:

One very large, very important publisher, whose name I will not mention because I am allergic to litigation, tried to wish [the problem of typesetting multiple-paragraph structures such as lists] away by pretending that in the total absence of hierarchy, any given type of list only needs one list item tag. The lame pretense is enabled (in the sense that one enables another’s addiction) by specific tags for extra line spacing, which is an abomination unto the fair names of typesetting and markup alike.

Being the kind and helpful person I am (stop laughing!), I wrote out the clearest explanation I could of where the spec had gone wrong and specifically how to fix it, and submitted it to my higher-ups so that they could give it to the publisher. For FREE, mind you, and unsolicited. I wasn’t interested in getting money for my work; if my higher-ups wanted to extort money from the publisher, that was fine, but I didn’t care. I just wanted the damnable spec fixed so that I could work with it without feeling dirty about the crap markup it forced me to eject.

My higher-ups quashed my report, all the way to the very top of the company I worked for. In fact, my coworkers and I were forbidden from giving the publisher any feedback at all on the spec. We don’t dare show up this publisher; they’re too important a client, I was told. We’ll look arrogant, saying we know more about markup and typesetting than they do.

Well, we did—at least, I did! And in fairness, so did one or two of my coworkers. The whole experience was a real eye-opener for me. I had been hired (I thought) for my markup expertise; I couldn’t fathom why it wasn’t being used to good purpose.

Eh, well. Be that as it may, the publisher never saw that report, though it did inform an article I wrote shortly thereafter. I doubt they ever got their spec fixed. All their millions of dollars bought them was a junk spec, a lot of useless markup tied to the junk spec, and a boatload of angry, frustrated typesetting contractors.

This is not to say that the consultancy outfit knowingly or wilfully delivered a garbage spec. I firmly believe they did their best with their limited knowledge. They just didn’t know what they were doing, and the publisher did not have and did not hire the expertise to catch their errors.

Know what you’re doing before you outsource. Please. Even your vendors—those that aren’t exploitative, clueless, or both, anyway—will thank you.

Dies Martis, 9 Martii 2004

A new wheel

I got a tip about Distributed Proofreaders and went over to take a look.

Sometimes the thing about being an ex-medievalist is that you don’t have other medievalists around to share the joke with.

See, what DP has done is reinvent the pecia system of manuscript-copying. Particularly at medieval universities where students had to copy their own textbooks, the one-copyist-one-book system simply became too slow. So popular books got broken up into chunks called peciae, and the chunks were passed around among student copyists.

DP discovered that the same system gets books proofed faster. (For complex books, I would wonder about accuracy—but for the kind of thing they’re doing, they’re probably all right.)

Those medievals. Knew more than they let on. Some of their wheels we’re still reinventing.

Dies Jovis, 26 Februarii 2004

I was right

I debated this post in my head for quite some time. Hereby I post it; the worst that can happen is I get sued.

Quite some time ago, I wrote this:

Once upon a time in eBookspace, there was a conversion house. “Ah! Open eBook Publication Structure!” said the people of this conversion house. “This is merely HTML in disguise.”

And they said, “Let us hire a great many robots at the lowest wages we can manage, build highly sophisticated production tools that even they cannot misuse (since robots, as we all know, are stupid creatures, prone to make mistakes), and turn them loose. We will not train our robots, since training is expensive and they need not understand what they are doing; they need only use the highly sophisticated tools in predictable fashion.

“We will not create high-quality OEB markup, moreover, since we will give our clients only the finished eBooks, not the markup that went into them—and in any case our clients are clueless about quality markup and we will do our best to see that they remain so. And yea, we will make a great deal of money and our days will be long on the earth.”

And they did as they had planned. And they failed miserably, and ended their days in great penury crying woefully unto the heavens about the injustice of their failure.

I got in heaping helpings of trouble (as in “a millimeter from being fired”) with my employer over this, because my employer thought—not without reason—that the company was being targeted. (And it was; I’d run out of internal places to try to push some kind of reform. But not uniquely targeted, not at all.)

I found out a bit ago that said employer ain’t in the conversion business no more. Now, while I’m not going to claim that the problems I pointed to are the only reason for their troubles in this arena, and I’m furthermore not going to claim that the company itself is imperiled (I don’t actually think it is)… I will claim that I was right about their mindset vis-à-vis conversion, and how much that mindset cost them in the long run.

My crystal ball is spotty, but every once in a while it turns up a winner.

Dies Jovis, 8 Ianuarii 2004

Checking checksums

The sharp and well-informed clew pointed out to me that my formulation of an ebook-integrity scheme is missing a major chunk: a reliable third party to keep track of the hashes, so that the hashes themselves aren’t spoofed.

She’s right. I goofed. But I tell you how I’d do it, at least in the States. Make keeping track of hashes an ordinary part of cataloguing ebooks. That gives us responsible heavyweights like the Library of Congress and OCLC to serve as authorities, while not parceling out total power to any single entity.

« Previous PageNext Page »
24 motorola ringtoneringtones usb motorolawish you were here ringtone