17 Decembri 2006

Control your bits

Well, here’s a position I never thought I’d find myself in: disagreeing with Peter Suber. In his comments to the news that Google is offering to digitize journal backruns for free, he says that he doesn’t see any downside for publishers who don’t already have a digitized backrun.

I do. I see a ton of downside, so much downside that I don’t think any self-respecting journal should take this deal. I do agree with Suber that should Google’s offer be accepted by a lot of publishers, open access would benefit hugely, at least in the short term—and to be honest, knowledge of that immediate short-term benefit is making it very hard for me to write this post.

Still, I do try to be honest, not just expedient. I simply cannot agree that publishers benefit from a Google deal in proportion to their contribution and their risk.

My stubborn objection to the shape of this deal stems from my ebook days, and boils down to this: never, ever, EVER agree to a digitization deal that doesn’t leave you in control of a copy of the bits. If you agree with that, stop here: no need to read the exegesis.

For the rest of you… Back in the day, NetLibrary and Versaware and others offered free digitization deals to publishers, in return for exclusive rights to offer the resulting ebooks for sale. (I don’t remember what the terms of the sales splits were, I must say, so I can’t do a direct comparison to Google’s revenue-sharing offer.) A fair few publishers signed on. Not a one of ’em I ever talked to after the fact thought it had been a good idea. Here’s why:

  • Exclusivity. Hate your digitization partner? Think you could do better elsewhere? Too damn bad. You signed; you’re stuck.
  • Quality. What NetLibrary and Versaware were pumping out back in the day was crap; I know this firsthand, thank you. (Versaware’s long gone. NetLibrary is still crap, you ask me; OCLC didn’t school them nearly as much as they needed to be schooled.) Once the contract was signed, the said contract specifying precisely nothing in terms of quality standards, publishers had zero input into quality control. Now, higher-ups at publishers know absolutely nothing whatever about text artisanry, heaven knows—but eventually the smarter ones realized what they didn’t like, and how they’d been skunked.
  • Nothing to show for it in the end. Sure, you could stop sending your books to your digitization partner. That didn’t get you copies of the bits of the books you’d already sent, nor did it free you to send those books to a new partner. Essentially publishers had given up all kinds of hope of future rents (and future reach) from their intellectual property, and for what? Low-quality digitization whose results they didn’t have any control over anyway.
  • Preservation. The publisher didn’t have a copy of the bits. If the digitization partner went under, the bits were basically gone. Anybody who signed up with Versaware howled about this when Versaware died. It didn’t do the howlers any good; as far as I know, those bits, lousy as they admittedly were, vanished into the ether for good.

Is the Google deal any better? Well, according to Suber, Google isn’t demanding de jure exclusivity. You don’t like what Google does to your journal, you’re free to shop it around elsewhere for re-digitization. Looks good on the surface, but let’s be real here: what library or other digitization shop is going to work with a journal that’s already done a Google run, unless the journal coughs up a whale-load of cash? As a digital librarian, I wouldn’t touch it; there’s plenty of work to do with journal publishers who don’t go Google, never mind all the stuff we can digitize that isn’t journals. I can’t imagine a grant-funder touching it either. So a lack of de jure exclusivity really amounts to de facto exclusivity. Caveat publicater.

Quality? I scoff. We know from the book project that Google is doing crappy work. We’ve seen it. And that’s just the scanning! We also know they’re not going to proof their OCR results, much less mark them up. (Has Google even heard of the NLM DTD suite, I wonder?) Journal publishers can do better, and should if they consider themselves responsible agents of scholarly communication.

Something to show for it? Well… intangibles, maybe. I do like the usage reports, though if I were a publisher I’d insist on COUNTER-compliance and the ability to share that data (for example, with libraries). Open access does increase the impact of a given journal. In a competitive journal marketplace, that’s worth something. It’s a plus for (living and still-working) authors, too, and in the Social Journal age that’s not wholly to be sneered at.

I’m not impressed with the revenue-share argument, frankly; I don’t think it’s going to earn a small player more than pin money, and even the big players might be shocked at how little they get. (Keep in mind also that Google is famously shut-mouthed about its existing revenue-sharing arrangements. Do you trust them, when they offer no data? I wouldn’t.)

But, more importantly, Google is controlling the bits. As I read Suber’s summary, Google isn’t promising never to lock them up, and it is promising never to hand them over. There’s no way that I see for a publisher to withdraw its material from Google once the contract is signed; as an OA advocate, I love that, but if I were a publisher, I’d hate and fear it.

Want a use-case? Here’s a use-case. Google’s low-quality work, combined with its control of the resulting bits, casts de facto exclusivity in an even more sinister light: a Google deal may well prevent a publisher that wants better-quality bits from obtaining them. The publisher can’t improve Google’s bits, because it doesn’t have Google’s bits. The publisher has no leverage to force Google to improve the bits. And the publisher can’t start from zero, because of the expense, dubious return on investment, and dearth of willing partners and funders. Rock, meet hard place.

“Better-quality” is a polysemous term, too; I mean more than just raw data quality by it. Consider the Social Journal. What if Google’s article permalinks are terrible? What if Zotero takes off like a rocket, but Google journals don’t play nicely with it? What if Google’s article metadata rots, as is all too likely given their track record? What if the standardistas come up with a fantastic new annotation mechanism that doesn’t work with Google? What if PDF finally dies its long-deserved death, such that all the serious journals are in NLM? Lots of possibilities, all of them ugly.

Preservation? Google isn’t signing up with CLOCKSS or Portico that I’ve heard, nor is it allowing its publisher partners to do so. The deal as detailed by Suber doesn’t contain “trigger events” that would cause the bits to be turned over to the publisher, or to another responsible agent (such as, say, CLOCKSS or Portico!). Google doesn’t have a preservation plan, and I daresay they don’t care about creating one.

Sure, sure, journal publishers don’t care about preservation either, and researchers care even less (just ask Dr. Harnad). I know that; I struggle with it daily. But I care, librarian that I am, and my sense is that the Google deal actively gets in the way of a proper preservation plan for the digitized journals. If Google goes under—and however remote an eventuality that seems now, the mighty do fall—whither the bits? And how does a journal suddenly dumped at square one with regard to digitization recover?

Careful CavLec readers may have noted that I have never dumped on the Google book deals this badly, and be wondering why. Simple. The Google book deals leave participating libraries in control of a copy of the bits. This causes its own problems, to be sure (just storing that much data is a challenge for an academic library), but it reduces a hopelessly lousy deal to a mere long-term bet: that even if Google plays access hardball, Google will eventually go down, leaving the libraries free to do the right thing. I’ve seen worse bets.

This journal deal, though? Is smellier than a day-care’s diaper pail left in the afternoon sun. I strongly urge journal publishers to think hard and bargain harder before signing on, if they sign on at all.

In fact, the only way I can imagine doing this would be if I were to (cynically and very possibly illegally) keep a copy of the bits myself (after all, they’re OA! Google can’t exactly stop me!) without telling Google. Eventually some David will topple the Google Goliath, at which point I’d have the bits and be willing to risk lawsuit by keeping access open, knowing that Goliath has plenty of other issues to worry about, and right-thinking libraries everywhere will support me should Goliath attack. That’s pretty sharp practice, though. A journal publisher unwilling to compromise its ethics that far shouldn’t sign on with Google.

Always control your bits. Always. Even your open-access bits. If you don’t believe me—ask a publisher who signed on with Versaware back in the day. Or ask a library—we’ve been asking ourselves this question for several years now, and Portico and CLOCKSS are some of the results of our ponderings.

Always control your bits. Can I make it any simpler than that?