Archive for May, 2004

29 Maii 2004

Give and take

Fenster award check: $150.

Student fees due for summer school: $146.87.

The bursar giveth, and the bursar taketh away.

I’m not complaining, just amused. (I’d have to be a real jerk to complain, what with my project-assistant tuition waiver.)

Jobs in librarianship

I keep an eye on the LISjobs RSS feed, for all the obvious reasons. Most of the job titles are reasonable things like “Reference Librarian.”

One listing today read, in toto:

RMS III, c/o ASRC

To which the only possible response is, WTF?

(Okay, I clicked the link, and it’s some kind of records manager. Still haven’t got a clue what the S stands for, as the description claims “Duties do not include supervision.”)

28 Maii 2004

Prior art

Something I ought to have mentioned yesterday but didn’t: There’s no need for metadata aggregators to charge forward as if nobody had ever done this kind of thing before.

Journal-article aggregators such as HighWirePress and JSTOR cope with an awful lot of metadata. I can attest that some portion of what they get is malformed—I once had to re-engineer a process to generate issue- and article-level metadata for some education journals destined for—no, it wasn’t HighWire, who was it? I forget.

Anyway, a co-worker of mine had handed an SGML template to a typesetter with zero knowledge of SGML and told her to go to it. So every issue, she had to cut and paste stuff from Quark into the template. Of course she made mistakes. And since she knew nothing about SGML, she did not know that nsgmls could help her.

So she’d cut and paste and FTP the result, and an error report would come back. With ONE error on it. She would scrutinize the text laboriously, fix the error, FTP the result—and another error report. With ONE error. Lather. Rinse. Repeat. Poor woman finally came to me when she couldn’t figure out what an error message meant.

I threw up my hands in horror and automated the whole thing via Roustabout XT, my homegrown Roust-munger, and the usual raft of regular expressions. And nsgmls. Poof, no more runaround.

Well. That was a digression. Sorry.

Point being, not long afterward that particular article aggregator (whoever it was; I still don’t recall) changed the way they sent error reports. Lesson learned, apparently. And metadata aggregators could certainly leapfrog a whole lot of lessons if they just go to these folks and ask how things work—and more importantly, why they work that way.

Worth a try, at least.

27 Maii 2004

Aggregating metadata

One of the librarian communiques I got about the trials and tribulations of metadata aggregation asked the very simple question, “What do you think aggregators ought to do faced with bad metadata?”

Well, I’m not sure. Let’s take that a piece at a time and see where we end up.

The first question, I suppose, is whether bad metadata is always better than no metadata at all. If it is, then aggregator policy should be to fix whatever can possibly be fixed, and only reject what is so malformed as to be uninterpretable. Sort of the “version-3 browser” method of coping.

I suspect the current situation is very close to this, because metadata aggregators are staggering-out-of-the-eggshell new, and if they’re to have any impact (not to mention further funding) they need to grab as much material as they can get their spider-claws on. No room to be choosy. The chaotic result is not (she said mildly) a state that commends itself very highly to librarians, however.

Frankly, if I were in this situation I’d plan to throw one database away. Accept everything for now, call it a learning experience — and junk the whole thing later when best practices are clear in favor of starting fresh when it’s possible to be draconian about what one accepts. Really. I would do this. It’s easier and cheaper than trying to weed out the existing database.

To do it at all, though, there will have to be a substantial consciousness-raising about metadata quality. (I mean, really, non-valid XML? If it’s defined by a DTD and not an XML Schema — as EAD is — there just isn’t any excuse for that. Validate your stuff before it goes out, people!)

Two ways to do that. One is the good old non-threatening educational thing. Conferences, workshops, publications, sample files, and so on. (Catching up to these issues in library school would help. If UW-SLIS is any indication, though, there’s considerable distance to go there.) Unfortunately, I guarantee you that the worst offenders will pay zero attention to these. I absolutely guarantee it.

There is nothing to do with such people except embarrass them. What’s more, the sooner you do it, the better. This actually militates against an accept-everything approach. Bad markup creators are like puppies; if you don’t catch ‘em in the act and whack their noses, they don’t clue in.

Fortunately, an aggregator has a few sticks available. The first one is obvious: don’t accept crap content. Now, “crap content” depends on one’s perspective, so it’ll be a bit of a chore setting the rejection/acceptance threshold to the right place. Even so, the principle is clear enough.

The second one is public humiliation. Again, the definition of “public” may vary — but we all like to be competent, so when an aggregator says “Podunk U’s metadata is garbage” louder than a whisper, Podunk U is likely to sit up and take notice.

Neither of these is any use, however, without accompanying availability of training, education, and consultation. That’s unpalatable, no question, but it’s the truth. If well-intentioned people can’t even learn to do things right owing to lack of resources, well… watch me nail up the OEBPS as a Stern Warning.

I wonder whether some of the aggregator builders thought their results would end up something like OCLC’s WorldCat (a vast and surprisingly accurate union catalogue contributed to by thousands of libraries and librarians). Not possible, I’m afraid. WorldCat is fact-checked by its users, themselves librarians. Librarians will indeed use metadata aggregators (and should be able and encouraged to leave feedback!), but there just aren’t going to be enough qualified eyes hitting each entry for that to suffice as an error-checking mechanism; finding aids get a lot less use than catalogue entries. Like it or not, the aggregator owners must stand up and take some responsibility.

So, all this said, what do I think I would do, if I ran a metadata aggregator?

I think I would build a three-tier database. Everything spidered lands in a staging area and gets some kind of once-over. Stuff from people with a history of good production may only be mechanically checked before hitting the top tier of the database. Problematic stuff may make it into the lower tier, may be rejected altogether — either way, notification is immediately sent to the source, along with an offer of help. Stuff from new providers gets checked pretty intensively. A top-tier metadata provider can be demoted if quality declines, and obviously lower-tier providers who clean up their act can be promoted.

How users see this is up to the aggregator, but if it was my aggregator, ordinary users would only see the top tier. If the entire corpus, including garbage, is available to users, that removes a major incentive for metadata providers to fix problems. I would be tempted to bury a whole-database search somewhere remote, however.

I would build a training program and lots of documentation, probably at least in part online. I would love these things, keep them up-to-date, and make sure lots of my staff knew how to write documentation and train.

But that’s me, and I’ve never, ever, ever done this or anything like it, so what do I know?

All this worry over metadata quality has implications for the Semantic Web too, incidentally. If an RDF spider accepts uncritically whatever it’s handed, muddying the waters becomes trivially easy. There’s lots of handwaving about reification creating trust metrics and authority, but to my eye it’s just handwaving — I can fake the reification dead easily (”Leigh Dodds said this! Really!”) and the only way to stop me is to peer outside the RDF. I think some of the trust-metric handwaving tries to address this issue, but again… show me something that isn’t handwaving and maybe I’ll play along.

26 Maii 2004

Kinesis come home!

Kinesis just emailed me an invoice for the keyboard fixes. They replaced the cord, it seems, and updated the firmware.

I tell you what, that thing can’t be home soon enough. I miss it horribly.

Little happy snippets

Just a few things making me happy:

  • My practicum director likes the ideas I threw at him. Whew.
  • Being six.
  • My baby niece apparently likes the mobile we got her. That’s cool. She’s a pretty good-looking kid, even if she did get stuck with the Salo schnozz.
  • IE need not suck forever. When I move to new hosting, this gizmo is definitely being added to the arsenal, alpha or no alpha. I am sick unto death of piecemeal IE hacks.
  • I got caught grooving to Blades/Colon Siembra this morning at work, but the person who caught me just grinned.
  • The Return of the King DVD is out.
  • Got my Fenster award check, and David got his reimbursement for the stuff he bought for Sindarin class.
  • A number of librarians (real librarians! not wannabees like me) have emailed me about my Newberry series, specifically my discussion of OAI and metadata quality. The gist is “oh heck yeah, it’s a problem all right, but it’s being worked on.” Cool. It occurs to me that working in a profession that lives and dies by standards will be a refreshing change.
  • I’m going to Indianapolis next week to visit friends.
  • I keep thinking of new things to add to this list!

25 Maii 2004

On failure and its contentments

A slight but important difference between a blog and a memoir is that the blog captures events as they are happening. That’s not to make a value judgment; hindsight and perspective lend value to memoirs. Immediacy, however, is also very valuable.

My graduate school story is a memoir, albeit one written fairly closely after the fact. Wolfangel’s is a blog. I don’t know if the two stories, read in parallel by a third party, feel as close in spirit as I happen to think they are. Close or not, I regularly wince in recognition at moments like this:

It’s hard to hear; I feel somewhat stupid, or — no, stupid works here. As does failure, or worthless, or any of the worse terms I use against myself, on bad days.

Yes, sure, it was the right decision for me. But part of me whispers that I did it because I was scared of failing because I’m just no good. Failing because you quit is different, after all. Part of me says that.

Mm. Yes. Part of me said that too, quite insistently. Curiously, though, my memory places this particular train of thought before I actually left. Afterwards, I was just too busy.

That preoccupation saved me a lot of angst, I think. I had to figure out how to earn my living. Sure, I was a failure, but even failures have to eat. By the time I had that one more or less figured out — I wasn’t a failure any more. I was keeping myself and my husband going, paying my taxes, keeping up with the mortgage. And by the time those small successes palled, I was doing well at my job… and by the time that was old hat, I was making a small name for myself in ebookdom.

Grad school had beaten me down, no question about it. I had failed. But I had survived, and I was even starting to prosper a little. Resilience is a sort of success, not unworthily exemplified by what my husband calls “Myrmekopolis” — an anthill in the back yard that’s been repeatedly, brutally pounded to nonexistence in the last week’s rainstorms, but is being rebuilt nonetheless.

Since that first big failure? I’ve succeeded at some things, failed at others, made mistakes aplenty and recovered from them. Lived, in other words, and done so reasonably well, if far from ideally.

People who have survived the absolute worst that life offers without being destroyed by it have a — I don’t know, a dignity, a larger-than-life distinction — about them. It’s unmistakable, at least to me, and I’ll lay odds most of my readers know what I’m talking about and can think of examples despite my inarticulacy. I’ll never match that singular calm, and I’m coward enough to hope I never have reason to, though I’ve tried to give it to one or two of my RPG characters. Just to be clear, though, what they have is akin to how I’m about to characterize the resilient people I know, but far more intense, far graver, rather less joyous.

Most of the people I know who have stumbled over something big and still managed to get up again are ex-academics, which I’m sure surprises no one. (Not that all of them stayed out of academia forever. Several went back. Some are still there, chasing tenure or whathaveyou.) Academia isn’t the only institution that lays people waste, of course; humanity seems to be damnably good at coming up with such institutions. But we’ll take resilient ex-academics as reasonably typical of the lot, in the absence of more comprehensive evidence, shall we?

They laugh, especially at themselves. They forgive themselves and others, perhaps their most salient good quality. They do not blindly accept authority; likewise, credentials or high position in a social hierarchy impress them less than skill, hard work, and potential. They don’t ruminate endlessly over their own or others’ errors; when they do recount them, it tends to be with compassion and humor.

And they lack, utterly, the fear that Wolfangel feels right now. They failed, and they survived. They know right down to their toes that failure is survivable. Not that they court it; it’s not fun. But they’ll take risks, fully aware that some won’t pan out.

I know a fair few people who haven’t ever failed at anything serious, too. Again, a lot of them are academics, just because of the social circles I’ve spent too much of my life in. There’s a strain of them that I alternate between pitying and finding utterly insufferable. Self-righteous, ruthless, blinkered men and women who brand every failure as a permanent and usually fatal flaw of character. Failure, like success, is always earned; no such thing as luck or inequality of circumstance. They’re Manichaeans or Calvinists, generally. The Good succeeds and the Bad fails, QED. Any outcry from the Bad is nothing more than contemptible bitterness from the irretrievably damned.

Why pity these people? Because they’re so terribly afraid. Oh, yes. They live on an inescapable precipice, just one misstep away from a profound abyss. If once they fall, they know they will shatter irreparably. They have abjured failure so forcefully and so long that their own minds transform it into a looming, inexorable enemy.

Anecdotes I’ve witnessed suggest that these people have a tendency to get stuck in ruts, too. An ordinary rock in the road becomes an impassable obstacle; every hint of opposition or difference of opinion they magnify into a targeted, intentional attack on them. Whenever someone else does well, it’s an affront to them, an affront and a worry, because it feels like a diminishment of the self. All of this, I believe, is intimately linked with fear of failure. Whatever goes wrong, it can’t possibly be me, because that would mean I was failing, and failing is unthinkable. Whatever goes wrong, it must be Them (whoever They happen to be).

I honestly don’t know what touched off my father’s career-long squabble with his department; I can’t have been older than eight or nine when it started. But after that, there was no resolution, ever, because my father wouldn’t let anything resolve. Anything that happened fed back into his unshakable belief that his department was starving and ignoring him, and he took any opportunity he could find to hit back (which, of course, cannot have endeared him). Take a look at his old vita and see if you can’t see a little of what I’m talking about.

Eh, well. An unpleasant subject, these people, a subject better let go. I could have been one of them; I’m glad I’m not. I’d rather go back to talking about the resilient.

As I’ve set up the category, you will notice, nobody can be born resilient, because nobody can be born failing. This constellation of character traits, given the initial impetus of a significant personal failure, develops over time, can even be cultivated. Insofar as I belong to this category — and I’m still working on it, heaven knows — it took me quite some time to get here.

I fear someone will think I am advocating hard knocks for everyone. Let’s recreate the learned helplessness experiments, whee! Well, I’m not. I don’t think we can engineer the kind of experience that kicks someone into a resilient mode of functioning, because I suspect the necessary experience differs rather widely. Not the kind of thing you want to get wrong, if you’re going to inflict it on someone. That doesn’t mean we shouldn’t value the phenomenon when it happens of itself.

This is the best answer I have for Wolfangel at present. I see her walking in my old footprints. I’m happy with where I ended up, and I see signs (little though Wolfangel may care to believe me) that Wolfangel will find her own peace, her own accommodation with her past.

Regret? Yes, happens; possibly always will. But my regret is such a little, little thing next to my newfound resilience.

24 Maii 2004

A Newberry afterthought

I forgot a Newberry tidbit that this Bob duCharme post about metadata brought to mind suddenly.

One of the lookie-lookie-cool-project presenters showed off a gizmo that algorithmically picks out two- and three-word key phrases from an article as things that might be searchable database-wide. I assume it does this with a frequency count plus a bunch of stopwords, but I’m no programmer.

The results were reasonably meaty, I thought, though I only saw results for one article. Certainly some random phrases, but for the most part, what it spat out didn’t look too bad.

I didn’t have to ask the obvious question; someone beat me to it. “Are you cross-referencing your results with appropriate thesauri?”

“Er, um… no.”

Well, really. Why not? What better way to knock out the random phrases?

I think this combo is where metadata application almost has to head. Use a human-created controlled vocabulary, then run the algorithms against it, and let humans sanity-check the results and feed back anything interesting into the CV.

Easy facets in MT?

Help, MT gurus! I want to build a faceted classification system so easy a child could use it, but I’m not finding it an easy task.

I know about Pixelcharmer’s solution as well as tima’s plugin. They are not what I want. Unless I’m reading completely wrong (and please tell me if I am!), they force me to create a huge number of categories, even with just two facets.

(For example, if I have one WhatIsThisThingAnyway facet containing Link, Form, and LocalInfo — which is what I’m looking at doing — multiplied by a Subject facet containing, say, eight entries, that’s TWENTY-FOUR CATEGORIES, and it only gets worse from there. I cannot expect my target user to use a category list that long effectively. I simply can’t.)

Multiple categories? Well, yes, but since the MT interface can’t separate out the categories in one facet from the categories in another, I can’t guarantee that my user is going to Do The Right Thing. This needs to be EASY.

Subcategories? I may have to go this route, but I don’t like it, because it limits me to two facets. I really don’t like that. I at least want an Audience facet too.

And then I noticed that MT has a Keywords box. Bonanza! I thought. I can have my user type in the items as keywords. It’s kludgy — I’d rather have checkboxes for some of these facets, a la Wordpress, which does category interface vastly better than MT anyway — but it’ll work; all I need is an MT tag that grabs entries that have a particular keyword or combination of keywords.

Guess what. No such tag. All you can do with keywords is list them for a particular entry. That’s real useful. Not.

The best I can find is a RelatedEntriesByKeyword plugin which isn’t what I want. I don’t want to grab entries based on their relatedness to another entry. I want to grab them based on a keyword I supply. I want entry listings by keyword the same way I can get entry listings by category.

Except I can’t have it.

If this were my personal weblog, I’d throw up my hands and move immediately to WordPress. But it isn’t; it’s for my practicum client, who’s heartily enamored of MT (and shouldn’t move to WordPress until it’s got native multi-blog support anyway).

I’m stuck. I can’t figure out how to make this work. If you have a bright idea, post it to your blog and ping this entry, or email me with it. Thanks.

Addendum: Okay, I misunderstood how the plugins work. I wouldn’t have 24 categories, but 11; all I have to do is prefix the category with its facet name, not cross-ref everything. That’s better, but it’s still not good enough; since all the categories are mushed in together, it’s still too hard to get the client to Do The Right Thing.

Oh, well. I think it’s back to the drawing board for me.

23 Maii 2004

Now We Are Six

AA Milne, paraphrased:

But now we are Six, we’re as clever as clever.
So we think we’ll be six now for ever and ever.

You folks will have to excuse us today. We’re off being Six.