Archive for March, 2005

23 Martii 2005

ACRL Blog People Dinner

Okay, it’s only two weeks to ACRL—who wants to do dinner with the Blog People?

I’m willing to do the heavy lifting as far as picking a venue out of choices sent to me, maybe finding folks rides, and so on. I need you folks to recommend vegetarian-friendly restaurants (I know nothing about the Twin Cities except the airport layout), pick a date (Thursday, Friday, or Saturday), and publicize the resulting event to more-popular library weblogs than mine (most of them, that is).

If you can help with any of these things, or have a strong feeling about the date, drop me a line, won’t you? If you only want to be kept in the loop, just watch this space for now; whether there’s interest or not, I’ll post again.

Not my day

This is obviously not my day to do an annotated bibliography.

I desperately want an article from volume 29 number 3 of a journal called Program. It is exactly on point, and better yet, it’s a summary article that could lead to more bibliography.

First of all, may I say that Program is a horrible, horrible journal name, impossible to find in a catalogue because of all the damned conference proceedings that use it as a title? I may say that? Oh, good. I say that. Because I had to wade through a metric ton of junk before I found the OPAC listing for the journal, and I almost missed it.

The SLIS library has bound volumes of the journal for quite a few years, but it is missing 29:3. Okay, okay, so we’ll try for online full-text. Wilson doesn’t have it. Emerald doesn’t have it. Nobody has it. Well, that’s just peachy, that is.

I’ll have to haul my rear over to Memorial to find it. And I will, because unfortunately it is exactly what I need. But as much as I love libraries, sometimes I just hate ’em.

ETA: Whew, that’s done. I did find the Program article, but I gave up on one almost as good because it’s in a journal that changed its name three times and is shelved in three different places, and who needs the tsuris, really? I expect I’m going to get gigged on currency because I don’t have much stuff newer than 2000, but so be it. I did find good stuff.

22 Martii 2005

Tomorrow’s plan

Heading down to campus tomorrow to work on this annotated-bibliography thing. I’ll be happiest if it’s done at the end of the day, but I’ll settle for whacking a sufficiently large hole in it. Did my initial DIALOG searches tonight; tomorrow is for chasing stuff down, reading it, annotating the good ones, and noting cites to do “who cited this?” checks on if I don’t find enough good articles from the first searches. (Social SciSearch. It’s the stuff, you bet.)

I note with no small amusement that I’m going to use the paper journals rather than full-text online whenever possible—yes, me, Ms. Electronic Text. I know where all the journals are in the SLIS library; they’re in a very small physical space. I can haul the Silver Surfer to campus, dig up and flip through dozen print articles off my DIALOG list, and annotate the good ones in much less time than it typically takes to navigate umpteen bloody database screens to get to full-text.

What this says about database interface design will be left as an exercise for the librarians and vendors among us. (Somebody please kick H.W. Wilson in tender spots until they streamline that horrible web interface they’ve got. Hint one: Frames are bad. Hint two: I like to do a lot of searches at once and then save the articles I find in separate Firefox tabs, because it’s more time-efficient than than the search-read-search-read grind. Javascript links make this all but impossible. Don’t use them. Thanks.)

Should probably also start costing out materials for the LAN design project, because that’s the big hurdle to writing it up and getting it out of my life. It should also let me ballpark an estimate for the budget discussion I have to put the finishing touches on.

The Minnesota talk is looming large, too. I think that’ll be Thursday’s project. I know the kinds of things I want to say—I just need to back it up. (And everyone should be happy to know that there will be no PowerPoint. There will probably be a handout or quickie Web page with bibliography, however.)

I note with a slight tinge of hope, however, that I’ve got my assignments done through the end of next week (except for minor amounts of budget verbiage that are no big deal). If I finish the bibliography, that gets me safely through ACRL, and if I finish the LAN design project, that plus one more search problem-set gets me through the Montreal meeting.

I may survive this semester after all.

Stop me before I search again!

Signs that you’re becoming part of the librarian hivemind include telling your husband at dinner, “No, I don’t know who said that, but I can look it up for you. DIALOG has a Quotations Database.”

Really, it’s just as well that I’m losing access to DIALOG after the semester is over. DIALOG is incredibly addictive despite some poor design decisions, and like other addictive drugs, it’s bloody expensive once you’re hooked.

I mean, to heck with Google and the Web. DIALOG is the good stuff. Mmm, yeah.

Bibliographic map access

And less than a day after I opine stupidly about GIS making bibliographic map access better, there’s this.

Yeah. That’s what I’m talkin’ about.

21 Martii 2005

Just sayin’

This new version of DialogLink is slower than a balky camel trying to make headway in a violent sandstorm.

Just sayin’.

And yes, I got that problem set done. Very, very slowly.

Priorities

I just got back from an excellent and productive meeting with my search client that has me champing at the bit to get started on the search—but priorities, priorities. So after I write up the meeting for my final paper (best to do that quickly, so I don’t forget anything), I shall turn my attention to the networking presentation and the next search-class problem set.

If I get through all that, it’ll be time to send out some more résumés. Brief phone interview this afternoon, granted, but there’s nothing like getting shot down to inspire one to new heights of frantic paper-pushing.

I will cop to certain superstitions surrounding important competitive milestones like job interviews. One such is that I beat long odds but never short ones. So it’s no wonder I lost out on the Ruritania position, being one of only two candidates; I expect I’ve a better shot at Rohan, seeing as how they’ve got four or five.

Another is that when I do get shot down on a job I really want, the job I eventually take turns out to be a blessing (if sometimes disguised at first). I had to apply twice to get the job at Impressions that set my entire working life on a new and rosier path. I got shot down on running the Puerto Rico Census Project, so I swallowed my pride and accepted a data-entry position—one that ended up introducing me to database programming and paying for nearly all of library school.

So I’m hoping that the pattern holds this time.

80/20 point and sensible defaults

I meant to mention yesterday (on the topic of converting MARC records) a concept I learned from the techies at the Open eBook Forum: the 80/20 point. In brief, it’s the idea that 80 percent of any job is easy, and the other 20 percent is invariably hard. Very hard.

Sometimes that means you give up once you hit the 80/20 point—the other 20 percent of functionality isn’t worth the extra effort. In this case, though, I just think it’s worth recognizing that we will have trouble converting some percentage of MARC records to something more usable and flexible. We need to accept that without repining. Sunk costs. The real issue is arriving at the best possible conversion target, so that we look back on the pain of conversion and say, “Well, that was worth it.”

My money’s on some permutation of FRBR plus AACR3, personally. I like XOBIS (which itself is very FRBRish) a lot too, but FRBR gets my nod because of its congruence with relational-database design and the hefty institutional might of the folks behind it. Doesn’t pay to bet against OCLC, methinks.

What? Not something XMLish like MODS? Well, no. I think something XMLish is a fine midpoint for this process, and we’ll always need and use XML (or something like it) for metadata interchange, but storing a large catalogue natively in XML is rank madness. Too much storage space, too little query speed; and the minor gain in human-comprehensibility doesn’t recoup those costs.

(The Evergreen people may yet prove me wrong… but we’ll see if what they’re doing scales to, say, a major research institution.)

There’s also a question of flexibility and future-proofing to consider. It’s not all that hard to add a new table to an existing database design. A lot of queries and middleware and whathaveyou will need rewriting to take advantage of the new information, but nothing should actually break merely because a table exists that didn’t before. (Turning a one-to-many relationship into a more complex many-to-many can indeed break things, I admit, but not badly and not for long; it happens a lot and what to do about it is pretty well-understood.)

The same is not necessarily true of an XML language, as we found out to our cost in the OEBF days. Software designed to work with a well-understood flavor of XML will quite often choke and die outright when it runs into an element it doesn’t know up-front what to do with.

So when the FRBR folks decide to turn their attention to solving the problems of map access (which I daresay will involve all kinds of fun GIS jiggery-pokery), they’ll be able to do that without having to shoehorn maps into the books table, which sad expedient in MARC is the reason we have map-access problems in the first place. MODS, however, is in trouble, and I say that with all possible affection for MODS.

If I may indulge in a mild non-sequitur, the 80/20 point also applies to user interfaces. It really does pay to design one’s interface for the 80 percent of boring everyday queries, not the 20 percent (well, less, actually, but let that go) of edge-case queries that test the limits of a query interface’s capacities.

One thing this means when the rubber hits the road is that an interface must set sensible defaults, both for itself and for the information it returns when queried. A couple of weeks back in search class, we were introduced to the meta-files in DIALOG, which tell you which databases index a particular periodical (or a particular subject), and how extensively. We learned the RANK FILES command, which orders a list of databases by number of hits on a given search.

My question: Why on $DEITY’s green earth is there a command for that? It is pathetically, cryingly obvious that RANK FILES should be the default operation for any such search! Why would one not want results ranked by most hits? Even if there is a reason (and I surely can’t come up with one), I guarantee that reason obtains in far, far less than 20 percent of searches.

Sensible defaults: one key to good user-interface design that librarianship has yet to master, in our search for 80/20 points.

20 Martii 2005

MARC, XML, and conversion

I’m not the first to have pointed this out, but it bears repeating: the normally quiet and rather tech-wonkish XML4LIB mailing list has had a terrific thread on the imminent (or not) death of MARC, and its replacement (or not) by XML. You can find the thread here if you scroll down a bit and look for the subject line “When will XML replace MARC?”

I won’t recap the whole thing (because doubtless I’d do a poor job), but I will say that Terry at Panlibus gets closest to my initial reaction: we need to figure out which level of MARC’s multiple functions we’re talking about replacing. I made this point loudly but fortunately not profanely in the talk I worked up for Ruritania, and one of the nicest things for me about reading the mailing list thread was seeing a lot of what I’d reasoned out for myself confirmed by people who know much more about MARC and AACR2 than I do.

There’s bits-on-disk—the actual storage model—and everybody knows that MARC’s binary format is a horrible dinosaur but nobody does anything about it, because the binary format is so intertwingled with everything else about MARC. The thing is, that decision is costly, all by itself. Anybody wanting to work with MARC has to decipher that format (or find a programming library that does, and my fave programming language only added such a library very, very recently). That’s a barrier to casual hacking right there—even we librarians can’t mess about with our own pet information format!

Compare this to the situation of the relational database, which runs rather more than half the world these days. These suckers have been intensively worked on, polished to a high sheen, optimized for speed and comprehensibility and all sorts of things. Give me a database full of information, newbie database person that I am, and I can answer all sorts of complicated questions with nothing more than a little SQL.

What I can’t do is put together weird and wonderful heretofore unforeseen OPAC queries, because the bare OPAC interfaces won’t let me, and I can’t get far enough under the MARC hood to program such things on my own. Is this retarding OPAC and OPAC-interface development? You better believe it is.

So that’s one problem. Someone else brought up the belt-and-suspenders problem: MARC plus ISBD. Stripped to its essentials, this is a problem of delimiters. MARC has delimiters, ISBD has delimiters, and sometimes the twain don’t meet.

To my mind, that’s an argument for junking them both. Neither is adequate on its own, and ISBD in particular is causing all sorts of data-repurposing problems (which you can read about in the XML4LIB thread). So let’s figure out what we need to delimit, and delimit it already!

Another objection to junking MARC/ISBD is conversion issues. Can’t do it entirely automatically, because ISBD can be unpredictable and there are errors in the records anyhow.

Well, okay, so what? One, this is a wizard chance to, I don’t know, find and fix large swathes of the errors! Two, if we ask for a completely automated conversion experience, we’ll never leave MARC, and heaven help us! Yes, this is going to take work and pain and annoyance. Yes, the rewards on the other end are worth it, especially the bit about never having to do it again because we won’t make the mistake of rolling our own rigid and inflexible encoding and content-designation formats from scratch.

And then there’s the ever-popular “we’ve used it for 30 years; why would we stop now?” You know, SGML is just about as old as MARC, and has about the same number of die-hard adherents. But most of us SGMLers, even the old-school (which I’m not), have made the mental switch to XML. If we can do it, so can you.

Anyway, go read the thread. It’s of far more value than anything I’ve said about it.

19 Martii 2005

Made it home

I had no trouble getting back from Rohan today. I was a bit antsy waking up, as the weatherfolks were reporting another band of snow through Madison early, but it had been cleared away by the time my plane left Minneapolis.

While I’m not a panicky flier by any means, I am a mildly nervous one, especially when I start flying again after a long time on the ground. I notice, though, that today I’d gotten beyond the stage where every least jiggle of the plane sends “demise imminent!” signals to my hindbrain.