I meant to mention yesterday (on the topic of converting MARC records) a concept I learned from the techies at the Open eBook Forum: the 80/20 point. In brief, it’s the idea that 80 percent of any job is easy, and the other 20 percent is invariably hard. Very hard.
Sometimes that means you give up once you hit the 80/20 point—the other 20 percent of functionality isn’t worth the extra effort. In this case, though, I just think it’s worth recognizing that we will have trouble converting some percentage of MARC records to something more usable and flexible. We need to accept that without repining. Sunk costs. The real issue is arriving at the best possible conversion target, so that we look back on the pain of conversion and say, “Well, that was worth it.”
My money’s on some permutation of FRBR plus AACR3, personally. I like XOBIS (which itself is very FRBRish) a lot too, but FRBR gets my nod because of its congruence with relational-database design and the hefty institutional might of the folks behind it. Doesn’t pay to bet against OCLC, methinks.
What? Not something XMLish like MODS? Well, no. I think something XMLish is a fine midpoint for this process, and we’ll always need and use XML (or something like it) for metadata interchange, but storing a large catalogue natively in XML is rank madness. Too much storage space, too little query speed; and the minor gain in human-comprehensibility doesn’t recoup those costs.
(The Evergreen people may yet prove me wrong… but we’ll see if what they’re doing scales to, say, a major research institution.)
There’s also a question of flexibility and future-proofing to consider. It’s not all that hard to add a new table to an existing database design. A lot of queries and middleware and whathaveyou will need rewriting to take advantage of the new information, but nothing should actually break merely because a table exists that didn’t before. (Turning a one-to-many relationship into a more complex many-to-many can indeed break things, I admit, but not badly and not for long; it happens a lot and what to do about it is pretty well-understood.)
The same is not necessarily true of an XML language, as we found out to our cost in the OEBF days. Software designed to work with a well-understood flavor of XML will quite often choke and die outright when it runs into an element it doesn’t know up-front what to do with.
So when the FRBR folks decide to turn their attention to solving the problems of map access (which I daresay will involve all kinds of fun GIS jiggery-pokery), they’ll be able to do that without having to shoehorn maps into the books table, which sad expedient in MARC is the reason we have map-access problems in the first place. MODS, however, is in trouble, and I say that with all possible affection for MODS.
If I may indulge in a mild non-sequitur, the 80/20 point also applies to user interfaces. It really does pay to design one’s interface for the 80 percent of boring everyday queries, not the 20 percent (well, less, actually, but let that go) of edge-case queries that test the limits of a query interface’s capacities.
One thing this means when the rubber hits the road is that an interface must set sensible defaults, both for itself and for the information it returns when queried. A couple of weeks back in search class, we were introduced to the meta-files in DIALOG, which tell you which databases index a particular periodical (or a particular subject), and how extensively. We learned the RANK FILES command, which orders a list of databases by number of hits on a given search.
My question: Why on $DEITY’s green earth is there a command for that? It is pathetically, cryingly obvious that RANK FILES should be the default operation for any such search! Why would one not want results ranked by most hits? Even if there is a reason (and I surely can’t come up with one), I guarantee that reason obtains in far, far less than 20 percent of searches.
Sensible defaults: one key to good user-interface design that librarianship has yet to master, in our search for 80/20 points.