20 Octobris 2008

The repository-rat section of my brain collided with the walnut-sized brain piece currently devoted to data-curation issues. This post is the result.

At the Purdue e-science conference, I first ran into the DRIADE project, and was surprised and dubious to find out they were using DSpace as the back-end for it. I never did explain why that bothered me so, partly because I couldn’t quite articulate why. I think I can now: it’s a question of where the behaviors associated with a given set of files reside.

DSpace and EPrints make certain assumptions about the files they take in. Key for our purposes is that they assume that all they have to do to mediate between a file and its end-user is serve it up in response to a request. Ask, give, end of story.

For a preprint, postprint, article… that’s fine, no worries. For a thesis, it may suffice, but theses are becoming more complex these days. Music theses at MPOW usually involve digitally-recorded performances, which (intellectual-property rights permitting, which they often don’t) one would wish to stream. Computer-science theses often include software, as one would expect. My husband is accumulating a tidy collection of spreadsheets working on his linguistics dissertation.

For a dataset, this ask-and-give assumption is pure disaster. Hardly anybody wants a whole dataset boiled down into a single file. Hardly anybody creates a dataset that way. Sure, they’ll tell you they just have the one spreadsheet, but that doesn’t count the data dictionary and the lab notebooks and the field notes and the et cetera. What’s more, datasets don’t want to be treated as unitary objects; ask-and-fetch just doesn’t work. Query, slice-and-dice, facet, analyze, number-crunch, mash up—that’s what people want to do with a dataset. They want it to have an API.

And all DSpace and EPrints can do is say “durrr, here’s a file.”

Of course this is why all the interesting data-driven development is happening on Fedora, which lets you build as much introspection and as many connections as you care to. The facet of that I find both fascinating and troubling is that Fedora’s very flexibility leads to a balkanization of the data contained therein. My data repository may allow operations on certain kinds of data that yours doesn’t, and vice versa. It’s good and necessary to do that kind of experimentation, but it also makes me wonder how much good we’re doing the research enterprise if some data drives around in a Lexus while the bulk of it is stuck on a Vespa.

Anyway, I have asserted that IRs as currently architected are not the solution to the data deluge. Now I’ve articulated why not. Go forth and ponder.

17 Octobris 2008

I’ve been promising to post a picture of the Bibliomedusa for ages, only it took me ages to get it framed, and then it took me ages to have a digital camera in the same room with it. The photo below does not do it justice in the least, but it gives the basic idea:

Bibliomedusa

For most people, I need to explain the joke, so I will. It is the single language-geekiest library joke I have ever, ever seen.

The library-school honor society, to which I am privileged to belong, is Beta Phi Mu. Reportedly, the letters stand for a Greek phrase meaning “librarians are the guardians of knowledge.” My husband the classics geek turned that into βιβλιοφυλακεσ φιλοσοφια μεδοντεσ (my transcription may or may not be correct; I know the Greek alphabet and… nothing else about the language whatever). The funny bit is that μεδονσ is the specifically masculine word for guardian. A female guardian is a μεδουσα. A medusa. A bibliomedusa!

The Bibliomedusa is the custom work of my friend Matt Grana. I think I asked him to put “Elseviley Verlag” on the pedestal of the stoned warrior in back; it came out as “Owed Late Fees,” but oh well. The use of a bit of the Beta Phi Mu logo on the “clock” in back is inspired (and yes, that’s the logo again as the Bibliomedusa’s tattoo). I love the Bibliomedusa to bits and she will go with me wherever I go.

So now you can share the joke, at least.

16 Octobris 2008

So at work we’ve gone the DSpace-1.5-plus-Manakin route, rolling out two new themes along with the new release. If I’ve seemed crustier and more irritable than usual lately? Getting a release ready to go, and dealing with the inevitable post-release bug list, does that to me.

So, yes, I’ve been swearing at Manakin a lot lately. Not as much as I’ve been swearing at IE6, admittedly, but there have been some “wait, what? how could you do that? what were you thinking?” moments. (And just so the Manakin devs know: if y’all change the pattern of the browse URLs one more time, I’m agonna take out a contract on ya. I let it go into production with broken links!)

This morning, though, I had to add an eperson to a collection as submitter, collection admin, and review step performer. It took me many fewer clicks and scans to do that than in the past; the ability to search for an eperson instead of browsing an obtuse list is great! Thank you, thank you, thank you for cleaning up this little chore.

15 Octobris 2008

So along with Greg Laden’s excellent flight of fancy, My Father the Anthropologist is a winner of the Open Access Day blog contest.

Now I’m supposed to run around snubbing everyone, right? Isn’t that how these things work?

Seriously, though, I hope librarianship gets a little bit of a boost in the open access community from this “famous to fifteen people” moment. Libraries are more than wallets, Dr. Willinsky, and librarians are strategic assets of great importance to open access, Dr. Jacobs. If yesterday’s what-I-do laundry list opened some eyes, that will be a big win for libraries and for open access.

I’m quite looking forward to my prize from PLoS, and it was great of them to sponsor the synchroblogging (which had over forty entries!). My big win from Open Access Day was something a little bit different, though: In my email this morning was a note from a member of our campus community, asking whom he should talk to about open access.

Win. Big win.

It’s funny, what modeling a behavior in the classroom can do. News junkie that I am, I make a point of bringing in tidbits from the tech news and the biblioblogosphere that reinforce what’s going on in class and connect it to the real world. (Anybody who stalks my del.icio.us feed knows that “644,” which is my class number, is my most populous tag!) And darn if the first thing they asked me last night wasn’t “Are you going to talk about the new copyright czar?”

I wasn’t, because I had a full slate for last night (that I actually didn’t get all the way through, but it’s okay; I expected not to), but I sure will next week… and if I’m teaching them nothing else, clearly I’m teaching them to pay attention to the world around them.

I also found out that one of my final-project tasks was totally unreasonable. I still think it was feasible (I know how I’d do it!), but I’m an unreconstructed markup geek with lots of data-conversion experience; it was unfair of me to project that onto my poor students. The group that took on that job really went above and beyond to try to figure out how the pros do it. Unfortunately, however, the pros aren’t unreconstructed markup geeks. The task is being revised accordingly, and I hope the students aren’t too traumatized by the experience!

The biggest hassle with moving from eleven students to nearly forty is turning out to be calming their anxiety. I thought I wrote pretty clear instructions on my assignments, but apparently not! Ah, well, lessons for next time.

14 Octobris 2008

In 1980 or thereabouts—I was eight or nine—my father the anthropologist started yet another rant about serials cancellations at his university’s library while he drove the family somewhere in the family car. He thought the problem an artifact of library underfunding, I remember. I don’t recall that he ever did anything about it save rail bitterly on the subject to us, his captive, powerless, and resentful audience.


At the inaugural meeting of the Open eBook Forum in 2000, David Ornstein and Janina Sajka explained what they hoped electronic books would accomplish. Amid the faux-visionary fluff and the crass dollar signs, one hope they expressed made me vibrate: that for the first time, a visually-impaired person would be able to walk into Borders or Barnes & Noble and buy a book off the shelf just like anyone else.

Access to human knowledge and creativity. Access for the wrongly disenfranchised. Access. I loved markup, I loved text, I loved design, I loved standards work—but then and afterward, it was the access argument that kept me engaged with electronic books. My father the anthropologist, his own eyes not what they had been, understood and endorsed that argument at once.


I certainly know how reassuring accurate, authoritative medical information can be. When my father the anthropologist went to the hospital for bypass surgery, I looked for every scrap of reliable information I could find about what he’d have to go through, what his chances were, what would happen afterwards. Information is hope for helpless bystanders.

I know what information gaps mean to the efficacy of medical care, too. I started my quest to treat my repetitive stress injury when my hands and wrists hurt so badly I couldn’t sleep some nights, nor survive a day’s work without severe pain. The open web, obvious misinformation aside, contained little more than nonsensical and insulting condemnations of RSI sufferers as malingerers, as well as blatant advertising of invasive surgery on the websites of orthopedic surgeons.

My primary-care physician insisted on old-fashioned treatment modalities before she would refer me anywhere. I paid for and endured weeks of wrist braces that I knew would not relieve my pain because I had tried them, as well as a tennis-elbow strap that left me in such agony that I refused to put up with it longer than a day. I did achieve a referral at last, and physical therapy turned out to be the right treatment. As I healed, the new search skills I was acquiring in library school, along with the access that being a student entitled me to, helped me discover that the medical literature understood why my doctor’s initial recommendations had been wrong. Why did I waste time, money, and pain over my inability to produce reliable information to assist my medical provider in treating me appropriately?

I can only be glad I wasn’t suffering from anything life-threatening, like artery blockage.


I was slotted into an online course in “Virtual Collection Development,” taught with patient lucidity by Jane Pearlmutter, my first semester in library school. Among the readings was “The Librarians’ Dilemma: Contemplating the Costs of the ‘Big Deal’” by the University of Wisconsin’s own Ken Frazier. There it was again, this problem of serials cancellations, framed in terms so transparently sensible that I could only exult.

Later in the semester came a unit on open access. It would be nice to say that lightning struck and I knew that was what I wanted to do with my professional life, but it didn’t and I didn’t. Of course I was intrigued; I knew several for-profit journal publishers from the worm’s-eye view of an erstwhile lowly data-conversion peasant. I wove the complaints I remembered from my father the anthropologist, my own experience in scholarly publishing, and what I learned in class into a rich, detailed mental tapestry, and I felt real hope that open access was an answer I could take back to him that he would understand and appreciate. Discovering that I would shortly join the profession backing open access only confirmed that library school was the right choice for me, even should I not work in the open-access niche myself.


When I landed my first library position just after graduating, I called my father the anthropologist. His first question was “How much will you be paid?” I declined answering. His second question was “What’s your title?”

“Digital Repository Services Librarian,” I said, with pride and no little amusement.

On the other end of the line, a lengthy silence.


My father the anthropologist used to buy lab equipment out of his own pocket, rather than struggle with byzantine university purchasing procedures and skeptical departmental scrutiny. Rightly or wrongly, he was convinced no one would understand or support him and his work, but he refused to knuckle under. He would do what it took, spend what he had to, to further the research he fervently believed in.

I have bought quite a bit out of my own pocket too, rather than charge it to the libraries that have employed me. I have bought color inkjet printers, various sorts of expensive paper for brochures and bookmarks and whatnot, and poster printing. I have bought software that I use for work-related purposes. Once I bought an expensive print run of a color brochure because an opportunity came up to distribute a lot at once so suddenly that I didn’t have time to print and fold them myself as I usually did. I bought a cross-country trip to an important repository conference when I was de facto between jobs. I bought a laptop on which I do repository-related work when the occasion warrants. I have bought buttons with images of Mars on them, because when you’re handed a golden acronym you might as well make the most of it. Like as not the libraries I have worked in would have paid for some or all of this—I never asked.

I have read, written, rewritten, commented, and debugged code in Java, Python, and XSLT. I have tweaked JSPs, murdered unnecessary HTML tables, and rewritten CSS designs from the ground up, swearing sulfurously at various versions of Internet Explorer. I have edited metadata in XML by hand. I have translated Endnote records into Dublin Core. I have screenscraped ugly HTML and cudgeled it into legible metadata. I have screenscraped yet more ugly HTML for transformation into preservation-worthy markup. I have built convoluted SQL queries slowly and carefully from the inside out, run them on production databases with fear and trepidation, and once or twice cleaned up after them when I’ve gotten them wrong. I have typed cargo-cult incantations at command lines to keep server software running and upgraded, and raked Google for answers when some incantations didn’t work as promised.

I have stared at lengthy CVs with a sigh, and then waded resolutely in to clear rights on as many of the publications as I could. I have searched SHERPA/RoMEO and Bowker’s Books in Print. I have hunted down agreements from publisher websites. I have asked faculty for their copyright-transfer-agreement files, and tried not to let my smile grow too pained when they told me they don’t keep such things. I have explained the difference between preprints, postprints, and publisher PDFs to politely incredulous auditors. I have read scads of legalese, and interpreted it as best I could. I have read and pondered the words of librarians and lawyers who understand the legal fine points much better than I. I have made some risky calls, likely some wrong ones. I haven’t been called on the carpet for them… yet.

I have held one-on-one meetings and demo sessions with faculty and librarians. I have designed and produced brochures, flyers, slideshows, posters, web pages, wiki pages, and one mini-movie. I have presented at innumerable campus expos, showcases, lectures, symposia, conferences, and workshops. I have called and written my elected representatives. I have blogged. I have written articles and self-archived them, sometimes after polite and fruitful discussions with publishers. I have run any number of failed efforts toward building a community of practice among repository managers, each new attempt the triumph of hope over experience. I have cold-called librarians, faculty, department chairs, deans, and administrators. I have been to more meetings than ought to fit in the three years I’ve been doing this.

You needn’t be obsessed like my father the anthropologist and me. Believe me, that’s the last thing I’d recommend to anyone. If you cannot find even one thing you can do in the above list, though, I wonder about you.


I once explained to a pleasant elderly faculty member that the repository didn’t easily allow changes. “It’s like a roach motel,” I said. “Files go in, but they don’t go out. Once they’re there, they’re stuck.” Suppressed chuckles from librarians in nearby cubicles greeted that statement, and I returned from ushering the faculty member out to find that my colleagues had good-humoredly dubbed me the Innkeeper at the Roach Motel.

I loved the sobriquet, despite the unhappy truth of its depiction of institutional repositories. I have never liked telling faculty members that my services couldn’t do what they needed, and I’ve had to tell them that often and often. Worst of all, I couldn’t envision my services as anything my father the anthropologist would find useful, compelling, or even comprehensible; the promise of green open access was fading fast in the unforgiving floodlights of faculty diffidence. I looked around the open-access community for understanding and a path forward, but I found little to help or reassure me.

My father the anthropologist and I are alike in one way at least: we don’t suffer fruitless systems in silence. In one way at least, we are different: I cannot content myself with complaining to the powerless and uninvolved.

I don’t think there’s a community I operate in that my gadfly ways haven’t irked or even alienated. My library school. My librarian colleagues. DSpace developers. Green open access. Library bloggers. The DSpace Foundation. Library coders. Repository managers. The open-access community in general. While I accept all this as the price gadflies pay for being pests, it is no source of pride, nor is it pleasant. I have feared for my job, and like as not I deserve to. I have feared that the career I find myself in will not exist in five years’ time, and I have wondered uneasily whether my own behavior has hastened rather than forestalled that eventuality. I have been cautioned, questioned, belittled, berated, cut down to size in public, stepped cautiously away from, set up as homo stramineus, misquoted, deliberately or carelessly ignored—and much of it I have richly earned.

I have also been heeded. I have also made change. Not much, perhaps; certainly not all the change I wanted to make, wanted to show my father the anthropologist, wanted to offer the world. Even so, change is my gift to them and to you: my gift I offer in my much-abused hands on this Open Access Day.


Rodin, La Cathedrale

Rodin, La Cathedrale.
Photo by Wallace Grobetz, via Flickr and the Creative Commons.

13 Octobris 2008

The beginning of the end for the institutional repository? You tell me.

I have to prepare an internal study about the usage of the documents available into Archimer, our Institutional repository, to justify the work we do on it.

We all know what I think, right? I even brought that opinion up-to-date recently.

I confess I’m a little surprised that this request didn’t come from a US repository. I’m also surprised that they seem to be focusing on downloads rather than uploads, so to speak, although I have the uncomfortable feeling that downloads may be all the ground they feel they can defend, and it’s crumbling under them.

I’m not surprised at all to see an IR forced to justify its existence. I’ll be utterly astonished if Archimer is the only one in the next year or three.

Go ahead and click over. The money quote I gave above isn’t the meat of the email. I’m still trying to work through the implications in my own head.

The DSpace Foundation found enough interest in an ersatz user-group meeting after SPARC Digital Repositories that it’s going forward. I’ve agreed to talk about the equally ersatz grassroots requirements-gathering process I tried (and as best I can tell, failed abjectly, though it was an interesting failure) to start.

In the course of putting together my thoughts on that subject, I stubbed my toe on something else, namely: Librarianship has a really bad habit of fracturing itself as a software user community along the fault-lines of which specific software brand a given library is running, and that hobbles our ability to light a fire under our vendors and open-source developers.

Consider the web for a moment. It balkanized for a while, as browser developers competed on “features” like blink tags and non-standard scripting environments. Then countervailing pressures arose, web developers with voice to spare and no commitment to a given browser who sustained efforts such as the Web Standards Project and the ACID test suite. Despite foot-dragging by a certain large software company based in the Pacific Northwest, these countervailing pressures produced results.

The web is trying to balkanize again, as Microsoft and Adobe fight over non-markup-based, non-standards-compliant interaction design. We’ll see what happens, but I don’t believe AJAX is out of the fight, and I do believe that resistance will coalesce in that community, aided and abetted by usual suspects such as the accessibility community.

Now consider the ILS market. It is completely, utterly balkanized. If you’re a Voyager shop, when did you last talk seriously to somebody running Aleph or whatever? Certainly not at your user-community conference, which is run by your vendor. Standards for ILSes? Don’t make me laugh. Mailing lists for ILS sysadmins? Vendor-specific and closed, not even archives available on the open Web. Conversation in the larger library community around ILS needs? With notable exceptions, takes place in the slow, pedantic, easily-ignored library literature.

So what happens? Vendor user groups have little power to make change at their vendor most times, because they’re contractually and pragmatically locked in and can be safely ignored. The only time they have power is when a contract is under negotiation, and funny thing about that, the contracts aren’t negotiated by the people who understand the software and user needs best! And no one can push vendors as a unit toward anything, because vendors don’t cooperate or exchange ideas, and librarians are too focused on their individual vendor to make them do so.

Most fatally, librarians don’t talk to librarians running different software, don’t present any kind of united front to the vendors. There is no librarian community analogous to the Web Standards Project. Heck, there are no ILS standards to hold their feet to the fire about, if you don’t count the morass of cataloguing standards (which, lest we forget, say next to nothing about how ILSes should interact with patrons and librarians, never mind the wider web). Sure, sometimes someone exceptional like Blyberg or NCSU can kick over the traces noisily enough to make a difference throughout both library and vendor communities—but that’s the software equivalent of lightning awakening Frankenstein’s monster. Unpredictable. Not to be relied upon as good practice.

Now consider institutional repositories. We’re completely balkanized based on software/service choice, too. I can count the times I’ve had interesting, important talks with repository managers not running DSpace without moving from my fingers to my toes, and I get out more than most of us do! This is partly why open-source repository software is so backward, to be sure, and that’s bad enough… but it’s not the sum total of the badness of this phenomenon, either.

I see a lot of questions on the dspace-general mailing list that simply don’t belong in a software-specific venue, because they’re equally relevant to repository managers on EPrints or Fedora or BePress or ContentDM or whatever. Questions about rights-clearance. Questions about IR policy. Questions about outreach and marketing. Questions that institutional-repository managers should be talking about as a whole, unfragmented community. We can’t wait for the library literature to catch up to us, if it ever does; mailing lists are unquestionably the right venue for these conversations. Software-specific mailing lists aren’t—and yet that’s where the conversations are taking place.

No wonder institutional repositories have no community of practice. Why, why, why do we librarians imprint on software like baby ducks? If I knew how to kick us out of this counterproductive, self-defeating behavior pattern, I would, believe me. I don’t know how. I’ve been howling about IR communities of practice for well over a year now, and I’ve made zero visible progress. Maybe someone else knows how to do this?

11 Octobris 2008

So we’re both getting over weeklong colds, and we’re both feeling pretty decent, so we go to our favorite Thai place for dinner.

And wake up with what appears to be mild food poisoning.

This is just cruel and unfair. Sure, if the G-7 can’t get its act together, a little food poisoning will be a drop in the bucket (so to speak). I get that. Leave me to my tiny individual misery, please.

10 Octobris 2008

It’s going to be an unseasonably glorious weekend here in the Frozen North, and in mid-October every proper Frozen Northian knows there won’t be too many more of those. So I decreed a rental-car weekend, took the day off (just as well; still coughing, though definitely better than I have been), and have been running errands.

First off was dropping my husband at work, which is nice for him because it cuts his commute time in half. Next, popping by the bird-paraphernalia store for forty pounds of sunflower seeds for the delectation of our local chickadees and finches, and the considerable amusement of one small gray cat on the other side of the window from the suction-cup window feeder.

After that I stopped at a coffeeshop for a chai latte and a peaceful hour of grading. Scheduling a quiz on the same day their position-description assignment was due was not the brightest thing I have ever done, but I shall make shift to manage. The quizzes are graded and recorded, and I’m chugging through the job descriptions. I’m amused by the number of them who have added two and two and set their fictional libraries in Middle-Earth. (”Helm’s Deep Library is an equal-opportunity employer. Orcs welcome!”)

Then it was the traditional stock-up trip to Woodman’s, which I do partly because some of what I buy I can get there for about half what it costs elsewhere, some of it I can’t find other places (what is the deal with dill weed in Madison? it’s out of stock almost everywhere I go!), and some of it is so hefty I’m much happier buying it when I have a car to haul it in. All told, I probably toted about a hundred seventy pounds of stuff in from the car when I got home: forty pounds of seed, two forty-pound boxes of cat litter, and fifty pounds (give or take) of other stuff.

I admit that I have a stockpile impulse, particularly when the outside world seems scarier and more uncertain than usual. Objectively, I know that I’m in amazingly good financial shape compared to most, that there’s a good bit of elbow-room in our finances (we don’t actually spend any of what my husband earns; we don’t even spend all of what I earn), and that I’m fairly frugal by nature. Subjectively… I stockpile. There’s no real harm in it, as reactions to uncertainty go.

I have a whacking lot more grading to do, but I think I may take a nap instead and see if I can’t shake the rest of this bug.