Archive for November, 2005

30 Novembris 2005

Tidbits

Too good not to note:

And I am pointedly not linking to the latest mauvais mot from ALA’s current president, because I’ve quite given up on thinking that anything short of a full book-cart to the head will make an impression on that man.

29 Novembris 2005

Ojax

Every repository-rat who reads this needs to check Ojax out now. It’s a federated repository search engine based on OAI. Alpha code, so install at your own risk. (Via OA News.)

I’ve been waiting for something like this. It had to happen. DSpace’s search is wonky as all get-out (try playing with parentheses or other punctuation—apostrophes are fun—to see what I mean), not to mention that each individual installation of DSpace can only search itself.

Consortia whose members run separate repositories (no, I don’t resemble this description at all, why would you think that?) can now search them all together. Anybody and everybody can consider running a mini-OAIster. (I kid. The mere thought of running OAIster gives any self-respecting library geek heartburn.)

When this baby matures a little, I suspect it’ll do something even more exciting—act as a subject-specific aggregator. What it would need to do (speaking from a DSpace-centric perspective) is learn to harvest specific communities and collections from a given repository, ignoring everything else. (I’ll have to refresh my OAI grammar to figure out how that would actually work—some perversion of ListSets, I would think—but it must work. Somehow.) This is the killer app, people. This is arXiv writ large. The sooner we can do it, the better.

I’m not actually installing this one yet; I haven’t the patience for alpha Java code. I am watching it with considerable (not to say ravenous) interest, however, and I hope somebody works out a HOWTO for publishing it next to DSpace (two Tomcat apps together is a groan-worthy prospect, unfortunately) and bypassing DSpace’s wonky search tools in favor of it.

Might even take a crack at that myself, if I get super-ambitious.

(Hm. I played with OAI a bit on the repository I run, and it turns out that ListSets only gives you collections, not communities. That’s sort of a shame. Working as designed, I’m sure, since communities are collections of e-peeps and collections are collections of stuff. Even so, I wonder if OAI allows sets within sets? Let me check… yes, it seems to be allowed, but DSpace doesn’t return sets that way. Weird. I wonder why not.)

28 Novembris 2005

Dodged a bullet

I almost had to go up for review next month. Yes, not even six months into the job, and I’d be putting together an argument that they ought to keep me.

Fortunately, I missed the cutoff date by two weeks. I am blessing those two weeks with all my little heart, because, yeesh, my chief accomplishments since I got here have been getting settled and figuring out what I’m doing! Somehow I don’t think puppy-dog eyes (even if I were good at puppy-dog eyes, which I’m not) would sway opinion terribly much.

A note for my proto-librarian and new-librarian readers of the academic-librarian persuasion: SAVE STUFF. SERIOUSLY. If you go to a conference, save the program. If you go to a lecture or seminar, save a flyer or your RSVP or receipt or something. If you take a class, save the grade report (oh, and incidentally, I got an A in Java), or at least the registration. If you give a presentation or a paper at a conference, save a copy of your slides, your handouts, or your paper. If you’re on a committee, save an official piece of paper that lists committee members.

And keep a (b)log of profession-related things you do.

Why? Because when you go up for contract renewal or promotion, there’s a distinctly non-zero chance your committee wants to see this stuff; around here, they’ll even accept stuff from before your current employment term as evidence of promotability. All the paper is proof you’re not fibbing, you see.

I know, I know; I’m rolling my eyes too. Even so—right this minute, make yourself a folder, and discipline yourself to stick stuff in it. I am, because next year I know I’m up for review—and I think I’m going to ask for a promotion, too; the worst they can do is say no. Fortunately, three years of experience isn’t a hard-and-fast rule for promotion around these parts, and I’m hopeful I’ll have enough other stuff to make up for lacking them.

Word vs. OpenDoc XML smackdown

A blogger I read religiously helped write a smashing comparison of MS Word’s XML format with OpenDocument. Good stuff, though I admit I’m not sure XLink is a terribly impressive selling point.

The authors missed a detail, though, one I’m rather surprised they didn’t comment on. They mentioned, correctly, that OpenDocument’s mixed-content model looks very XHTML-ish and readable, whereas MSXML looks like a document pureed in a blender. (Okay, phrasing there is mine. I’ve had to cut down some MSXML documents for use in ordinary XHTML. Extremely not fun.)

What they don’t mention is that OpenDocument’s manner of handling inline markup (such as bold or italic formatting) easily leads to a well-formed XHTML (or other XML) output. MSXML’s—doesn’t, necessarily. I don’t know whether MS has actually fixed Word to make impossible the case I am about to lay out, but I do know that the underlying data model used to give my old VB-guru friend Damon fits, because of all the extra processing he had to do at paragraph marks to get anything even vaguely resembling well-formed output.

Anyway, I’m not going to try to write MSXML to make my point, because I loathe MSXML just that much. However, the basic idea is that MSXML will let you get away with this:

<p>Here’s some text in a paragraph, <start what="bold"/>and the end is boldfaced.</p>

<p>Whereas the beginning of this paragraph is boldfaced<end what="bold"/>, and the end is not.</p>

That’s well-formed XML, yes, but do you see what happens if you try to boil that down to XHTML? Your output won’t be well-formed, because of how MSXML treated that boldfaced text:

<p>Here’s some text in a paragraph, <b>and the end is boldfaced.</p>

<p>Whereas the beginning of this paragraph is boldfaced</b>, and the end is not.</p>

Don’t try this at home, kiddies, because your validator won’t like it.

I had a conversation once with a Very Smart Person who does things like write citation parsers for grotty author input. He told me that there used to be much friction in word-processor-space between applications that enforce some notion of well-formedness (like OpenOffice) and those that don’t (like MS Word). The additional programming burden of well-formedness, added to users’ nonexistent understanding of the concept, meant a significant enough annoyance load both for user and programmer that sloppy data models, in which inline formatting can overlap block formatting, won out.

I’d be thrilled to pieces to see that pernicious little trend halted. I’m aware well-formedness causes problems—annotations like comments and change-tracking are the big use-case; their targets, though logically inline, can legitimately span blocks. Still, OpenDocument seems to be managing all right. I hope they keep doing so.

27 Novembris 2005

Expectations

The other day I got a lovely email from the senior programmer on a marquee digital-library project, one I’ve known about for ages and drooled over as long as I’ve known about it. They do the kind of thing I was expecting to do when I got out of library school.

And am not doing, as a matter of fact. I’m doing something that resembles what I thought I’d be doing in, well, almost exactly no way at all.

I introduce myself these days as a “digital archivist,” because it makes people’s eyes light up with enough understanding to go on with. I don’t use my official job title because it makes people’s eyes light up with “WTF?” Either way, what I do is a long way from text wrangling. I’m a petit-bourgeois shopkeeper now, not a peasant artisan.

Funny thing is, I’m not complaining. I’m working on several entirely new bags of tricks. Who’d have thought the blunt text churl could smile and smile and be a villein? (Sorry. Irresistible.) I was weaned on SGMLish hierarchy, but I’m adapting to relational SQL just fine, thanks. I still find Java a deeply irritating language—verbose, redundant, and clunky—but I don’t automatically flee to Python, either. Aside from the annoyance of clearing rights, which is an annoyance that I have got to find better ways to handle, I like what I do.

This comes up because I’m trying to help a new librarian of my acquaintance (graduated when I did, though not from my institution) find a job. She has unfortunately decided that she doesn’t want the jobs that her training best fits her for, and she’s a tough sell (zero experience, crowded field) for the job she says she wants.

I think she ought to play to her strengths, even if she doesn’t want to be in that particular specialty all her life. It’s such a fluid world, once your foot’s in the door. I’m not doing what I thought I’d be doing, or what I was initially trained up to do. That’s just fine. Heck, if I’d stuck with my first post-college or post-grad-school job, I’d still be answering phones and typing memos. Your first pro job isn’t your destiny, not if you’ve got any gumption.

I’m not sure she’s hearing me, though. Discouraged—even though her abortive job search was maybe half the length mine was!—she’s working retail, and not sending out applications. I’m hoping I’ll be able to goose her back into the hunt after the new year.

As for me, the wind is blowing rumors of medium- to long-term plans that may make good use of this old artisan’s sinewy text-wrangling muscles. I’m content to wait and see.

24 Novembris 2005

Gladness

A lot to talk about, this year.

I’m glad I’m healthy, physically and mentally. I have hands that work, mostly without pain. I have a psyche strong enough to survive a lot of stress and come out the other side with an intact marriage (far from assured, at the beginning of this year) and an upbeat sense of the future.

I’m glad the four Salos and their appurtenances moved safely. I’m glad we’re settling in. I’m glad the Madison house sold, by all appearances to people who will appreciate it as it deserves.

I’m glad for UW-SLIS, which happily gave me enough rope to hang myself with, and a pretty good education on top of that. I’m glad to have gotten out feeling pride and appreciation; it genuinely does make up for a lot that I suffered the previous time. I’m glad for a new job I enjoy, with people I like being around and a boss I couldn’t be happier with. I’m glad and grateful for all the help I received on the job hunt.

I think most of all I’m glad that I could be there for a few people who needed what I could offer. I made a difference, not just to how they felt, but to how they were able to live their lives. Chances to do that are a gift, and I’ve had more this year than I probably deserve.

I am grateful.

23 Novembris 2005

Trade me!

Okay, I had to get in on the librarian trading card game. As You Know Bob, I pretty much don’t let photos of myself get out into the wild (and the wild is most grateful for that, I’m sure), so I went for my Philip-Pullman daemon instead, the pika (or rock coney, or rock rabbit, or whatever your dialect’s term for it is). Click on the pic for a larger, text-legible version.

Librarian trading card

Not to violate Proper Librarian Decorum or anything, but I could see these being a marvelous outreach tool for a library with enough of a sense of humor to do them for public-service staff.

Oh, and create your own card!

21 Novembris 2005

Organizing repository items

I had an interaction today with one of the repository’s early-adopters that bears examination.

The faculty member in question will be submitting papers that he has already carefully organized into over a dozen categories on his own website. He wanted to mirror that organization on the repository—only the repository isn’t really set up to do that.

I mean, it can. I could have made him a DSpace sub-community in which each of his categories was represented by a collection. Administratively, though, that’s a nightmare; collections can’t be deleted without wiping out the items in them (said items can be reassigned to other collections first, but speaking of nightmares…), for example. The entire structure has an unacceptable rigidity. Categories change; DSpace collections are forever.

Not to mention that the overhead in setting up a collection is, while not onerous, not inconsiderable either. DSpace’s data model is designed to reflect the structure of the submitting organization, not so much any classification or categorization arising from the materials submitted.

I’d love to hack a faceted-browsing-and-search system into DSpace; it’d solve an immense lot of problems. The “how much of this stuff is peer-reviewed?” problem. The “I’m looking for a thesis (but not anything else)” problem. (Yes, I know that’s in the metadata already, but DSpace doesn’t let you browse on it!) The problem of communities organizing their collections orthogonally—some do it by type of resource, some by who’s submitting the resource (e.g. faculty vs. students), some by subject categorization, and some by combinations of the above. Faceted views would let communities organize collections and items the way they want to, while users browse the way they want to.

Mark my words, this is a major change, not overly amenable to my desperate-hacking approach to life. The database would have to change, the metadata would have to change, and I don’t even want to think about the user-interface changes. Not to be done on a whim. But what am I to do with depositors who have their organizational act together?

I compromised. This faculty member is getting a single collection, and I’m going to help him keyword his work from the appropriate controlled vocabulary so that it’ll be easily findable from the great wide world. He will maintain his own categorized browse-list outside the repository altogether, linking to the items in the repository via the famous unbreakable URLs.

This is not as bad a solution as it might sound. Most people won’t arrive at a given item in a repository via browse. They’ll have a citation already (in which case they can browse up from the item), or they’ll have searched for it (in which case the carefully-constructed category browse has done them exactly no good at all). Good keywording plus an outside browse list is as close as we can get right now to the best of both worlds.

It shouldn’t be this hard, though; truly it shouldn’t.

17 Novembris 2005

Who owns culture?

I just got back from a vigorous and invigorating panel discussion involving Siva Vaidhyanathan, anthropologist Shalini Venturelli, lawyer Karol Kepchar, and law professor Peter Jaszi on the topic of copyright and cultural ownership.

I won’t bother trying to summarize; it was a very high-bandwidth discussion, and I’d make a terrible mess of it. A few highlights, though, that lead to my small contribution: All the panelists made the point at some juncture or other that the gap between producers and consumers of information and ideas is less wide than often thought. I read weblogs; I also write one. I listen to music; I also sing.

Also prominent in the discussion was the power of the state to influence the creation and consumption of information and ideas: through funding, through putting muscle behind particular definitions of culture, through creating artificial scarcity and limited (or un-) monopolies. Dr. Venturelli went so far as to postulate a triangle: users (culture consumers), exploiters (creators and craftsmen, those who make money from cultural activity), and the state, and to say that if this triangle is not kept in balance, creativity and innovation do not reach optimal levels, society-wide.

One question posed late in the discussion involved the pernicious effects of the “intellectual property” metaphor on legislative activity: our congresscritters have yet to figure out, apparently, that you shouldn’t treat an idea like a chair. That raised the counter-examples we all know and love: open-source software, copyleft, open access scholarship (yes! it was mentioned! by name!).

Unfortunately, these phenomena were characterized in a fashion that makes them easy to dismiss. Congresscritters don’t want to hear about Raymondian gift economies, or centuries of free exchange of scientific thought. That’s hippie-dippie talk, that is. Perilous close to that lefty commie stuff.

I prefer to paint this behavior in terms of a hard-headed, pragmatic bargain. Since we are all now both creators and consumers of ideas, in some circumstances it makes sense for us to let go of some of our (state-granted, in some cases; in others, granted by custom) power to exploit culture in order to retain power to create culture. As for the third leg in Venturelli’s triangle—frankly, we’re subverting it, at least as it is currently constructed. We’re using enactments of the state for purposes the state actively distrusts.

Dr. Jaszi’s latest venture (to debut tomorrow, and please check this out—it’s fascinating!) illustrates this bargain beautifully. Documentary filmmakers, sick and tired of the “clearance culture” that is gumming up the works, are sorting out on their own initiative how to balance their roles as creators and re-users of culture.

And in another small victory for my job security, the panel was videoed, and I got hold of the event organizer to request a copy for the repository.

Victory!

I signed on an entire school to the repository today. I honestly thought about calling it a day and going home—what else could I possibly do to match that?

There are those who think that faculty will jump to deposit their work in the closest available repository just ’cuz. I won’t say what I think of these people, because the politest word I can manage to dredge up from my reasonably-extensive vocabulary is “deluded.” Faculty don’t do anything just ’cuz.

I have been tearing out my hair (well, not literally) for four solid months trying to raise awareness and snag some early adopters. Finally, finally, it pays off—and, curiously, not with a population I’d directly addressed. Which, if anything, makes me happier; it means word’s getting out.

The first one is the hardest. It should get easier from here.

(And no, I didn’t go home. I have an extensive collection of stuff from a teaching institute last summer that I had to get started clearing permissions for. O ye would-be repository-rats, beware: this is the most time-consuming bit of coping with existing multi-authored pieces of work!)