Warning: fopen(/home/.lasher/yarinare/cavlec.yarinareth.net/wp-content/cache/) [function.fopen]: failed to open stream: Is a directory in /home/.lasher/yarinare/cavlec.yarinareth.net/wp-content/plugins/wp-cache/wp-cache-phase2.php on line 96
Caveat Lector » Librarians and error

Dies Veneris, 17 Decembri 2004

Librarians and error

Roy Tennant has a column that should raise some hackles. His contention: library bibliographic data are simply not good enough for the uses the computer age would ideally like to make of them.

You know what? He’s absolutely right. I got an A+ on my paper talking about automated error-correction in cataloguing, but that’s not proof, just a demonstration. To prove it, one need only search OCLC WorldCat for a bit. I’ll show you the entry that Nichole kindly emailed me for my paper:

OCLC# 36592721
040 WR2 $c WR2 $d OCL $d OCLCQ
090 LD3635.5 $b .M456 M397 1956
090 $b
049 GZMA
100 1 Mclain, Howard A.
245 10 Spray drying of serratia marcescens, a bacteria (supplement to development of a high velocity sprat dryer) $c by Howard A. Mclian.
260 $c 1956.
300 98 leaves $b ill. $c 21 cm.
504 Thesis-(Ph. D.)–purdue Unoiversity.

Even the non-librarians can see the typos (boy, don’t you just hate those wet sprats?). Librarians will additionally cringe at the punctuation (or lack thereof) in the 300 field and a few other details. If it makes anyone feel any better, Nichole informs me the agency that submitted this record isn’t allowed to put records into WorldCat any more. Doesn’t stop this record from being a total disaster, however.

But errors are only part of the problem—a large part, but a part nonetheless. Tennant quotes Lorcan Dempsey, who nails another part: “we cannot entirely, unambiguously slice and dice the database because of historic data entry and cataloging practices that…were not oriented toward our new needs.”

ISBD punctuation makes perfect sense on catalogue cards. In a database? It makes zero sense. Where it isn’t redundant to MARC, it marks details that ought to be caught in the MARC data structure. (Or whatever data structure one is representing bibliographic records in. I’m certainly not wedded to MARC. A hundred-thousand-character record-size limit? Pathetic.)

Moreover, when AACR2 and related standards changed, old records more often than not weren’t updated. This means that, like it or not, our catalogues are dirty and inconsistent. Computers loathe inconsistency.

My suspicion is that the current sad state of affairs has a lot to do with the evolution of MARC from card-cataloguing, combined with librarian technophobia. When MARC was invented, relational databases didn’t exist, nor was the state of the art in human-input validation terribly impressive. (If MARC had been invented a decade later, things might indeed have been better. Ah, well.) Catalogues started dirty and only got dirtier. Librarians didn’t understand that computers could help them with accuracy, didn’t know to ask their vendors to build validation checks.

In 2004? Voyager’s cataloguing module (as used at the UW, at any rate) has fewer and stupider validation checks than the ones I programmed into the Access database we used for the Puerto Rico Census Project. I myself (during a live demo for cataloguing class) saw a Voyager record-validation check totally fly by a missing slash at the end of MARC field 245 subfield a. How sad is that?

(Now, Ex Libris’s module looks a great deal slicker, and various external programs can do some record validation. But we still don’t have a spellchecker—a spellchecker, for heaven’s sake!—designed to work well with MARC records or bibliographic databases. Librarians still find and correct typos by hand. Think about that, and shudder.)

We’ve got human problems to think about, too. Cataloguing budgets are being slashed all over the place. My cataloguing professor’s response to my in-class paper presentation amounted to, “There’s no time for error-correction. Quantity is what counts!” Not to mention that if we aren’t careful about how we implement validators, they’ll get in the way and cataloguers won’t use them. (Compare the situation of publishers trying to get authors to write in XML instead of Word.)

Worse still is cataloguer pride; the cataloguers I know aren’t going to take kindly to hearing either that they’re creating errors or that their errors are causing real problems. An article I cited in my paper said outright that accuracy is a pipe dream, and that the only errors that matter are the ones that damage patron access. (To which I say, well, computers are becoming patrons too, after a fashion. Doesn’t their access matter?)

So. Where do we go from here? I don’t know, but I can suggest some places to start:

  • Figure out what we can do by way of mechanical cleanup of existing catalogues, and do it. Cataloguers will scream. Let them. Do it anyway. OCLC has done this at least once (as I found for my cataloguing paper); I think it can and should lead the effort, since it’s gone and 0wnz0red union cataloguing.
  • Build better MARC validators. Open-source ILS developers, are you listening? The bar isn’t as high as you may think it is. But for heaven’s sake, pay close attention to usability. We have to get existing cataloguers used to the idea that the computer is looking over their shoulder.
  • Start from scratch on a relational model for bibliographic and authority records. (FRBR is nowhere near ready; it’s just not detailed enough yet, and it doesn’t like serials.) We can howl all we want about how MARC should go, but it won’t make a damned bit of difference until we demonstrate that we have something better. And yes, it has to handle Mighty Morphin’ Serial Ranges and new media and websites and datasets and images and things we haven’t even thought of yet. If it can fold in archival finding aids, so much the better.
  • Find money to fix stuff. Just that simple.
  • Stop letting garbage into our databases. OCLC, I’m looking at you. Why aren’t you running validation checks at record input, and rejecting records with obvious errors? Sure, your input agencies will scream. Let them. Do it anyway.

I find that I am somewhat of two minds on the question of all the data-entry redundancy that would be eliminated with a proper relational model. Yes, faster entry, yes, fewer chances at error—but also yes to fewer chances to cross-check parts of records against each other. Seems a shame to lose that. But if that’s all we lose in the transition, I suppose I can live with it.

anime ringtones for motorolamotorola c331 ringtonemothers day mother fucker ringtone