What repository software developers don’t know about libraries
This morning’s reading was the admirably honest and straightforward “Taking EPrints to the Next Level” report. In a field drowning in useless happytalk, it’s good to see an effort as important as EPrints stepping back and taking a good hard look at its missteps as well as its (considerable) successes.
(I am also personally chuffed because I came to one or two of their same conclusions completely independently in “Roach Motel.” I always welcome evidence that I am not in fact stupid or useless, when events conspire to paint quite a different picture. Also, I need to bug the Library Trends editors for permission to post a “Roach Motel” preprint, now that I’ve sent the draft in. Somebody kick me about that.)
There’s a lot of talk in the report about marketing, and failures thereof. Seems to me that what happened—and this insight could be extended to the entire institutional-repository premise—wasn’t so much a failure of marketing as a failure of market research, a serious and ongoing failure.
Simply put, just as libraries charged gaily into running repositories without understanding how the entrenched structures and reward systems of academia would hinder them, repository software developers charge gaily into development without understanding how libraries work, or how repositories work inside libraries. (Yes, yes. I know libraries are not the only venue for repository software. You can’t tell me the market research got done for those other venues either. If it had, I wouldn’t have to keep telling random people who call me on the phone that no, sorry, DSpace doesn’t have any paid vendor-support or consulting options.)
So here, free gratis and worth what you paid, are a few hints about libraries and repositories that ought to inform development and support decisions.
First, the usual open-source “scratch own itch” development model doesn’t work as well in libraries. The reason for this is that with a few exceptions, librarians are not programmers and do not think like them. Most librarians aren’t aware they have itches; we’ve been so beaten down by our software vendors that we won’t scratch our own noses without an RFP! And when we do know we itch, we typically outsource the scratching, not least because we’re not taught to program in library school.
I have personally experienced having my own programming skills wildly overestimated by a DSpace committer. And I’m more technical than most in my field! By a long shot! So repository software developers cannot count on a ready-made bunch of talented, trained itch-scratching developers. Folks, I’m as close as it gets, and I’m not half good enough for your needs. How you fix that I don’t know, but you won’t fix it if you don’t first acknowledge it.
Second, the community-based development models that are so fashionable just at present in the repository community are equally if not more precarious. This just isn’t how libraries are accustomed to acquiring their software and having their needs met! The EPrints report goes into the results of this disconnect in considerable sheepish detail, so I don’t need to; I will merely remark that I’m not bullish on Fedora Commons or the DSpace Foundation.
What are libraries accustomed to? RFPs. Vendors. Hosted services. Black boxes. Fee-for-service, not fee-for-input. Passive, reactive technology management involving minimal technical staff. Saying “no” to anything, from chat-reference services to blogs and wikis to open-source repository software and its associated pay-to-play community structures, that doesn’t fit into those categories. We libraries don’t get involved with standards development except in our own community (and that only rarely); we don’t even have that model to look at!
When communal software development happens in libraries, as it occasionally does, it’s worth noting that the genesis appears to be handshake agreements among individual libraries (or even librarians) who know and trust each other. That’s a very different model from “Hi. I’m the Community Federation and I’m here to help you (if you give me lots of money)!”
So talk about community-based development models is just so much “blah blah blah how much is this going to cost me and why should I pay it at all?” to a library administrator (who, I reiterate, typically knows as much about software development as my cat knows about Christmas). These federations and commonses and foundations and stuff had better get to work on some kind of fee-for-service funding model, because they’re fish in a barrel if they don’t.
Third, this is not a good time to be asking libraries for resources for repositories. Institutional repositories are in enough trouble as it is. MacKenzie Smith asked me rather peevishly on the DSpace tech list why I couldn’t just go get a developer assigned to the repository for a year. Trust me: if I could, I would. For reasons it would be unbelievably inappropriate of me to discuss onblog or on-list, that’s about as likely as the foot-plus of snow on the ground in Madison melting to bare earth by tomorrow.
I’m not alone in this. “Repositories operate on limited budgets, if they have specific budgets at all,” says the EPrints report (page 10). This squares with my experience—and if open-access advocates want to know why this is, they need only examine screeds from some of our more vocal advocates proclaiming that repositories are cheap and easy and fill themselves like magic. That (false) ideology is now coming back to bite would-be repository communities hard.
Fourth, a good many library technologists hide themselves—from their administrations, from their fellows, from the world—because the risk is too great of being shut down abruptly if one is discovered doing this sort of work. I wish I were making this up, but I’m not; I’ve started and supported a couple-three skunkworks projects myself, because there just wasn’t an open way to get the work done. This reticence means first, that the oft-touted egoboo benefits of open-source development do not typically motivate work in the library context; second, that developers in libraries have a tremendous incentive not to share work with other libraries or developers.
Fifth, most libraries don’t have any library technologists. Any. At all. This means that if you put functionality in a server config file, or require a server restart to change it, most libraries won’t be able to tinker with it. At all. “Out of the box” means very different things to developers and librarians; I’ve yet to meet a developer who understands that it means “I don’t have to mess with it, ever, because even if I wanted to I couldn’t” to librarians.
What all this means for open-source software such as DSpace, EPrints, and Fedora is that many itches do not get scratched, and those that do are only scratched locally, these local enhancements never fed back into the software. It certainly doesn’t help the situation that the average open-source developer regards user questions as an annoyance, rather than free requirements-gathering for the next dev iteration. I was roundly mocked for saying something along these lines on the DSpace technical list, but eppur si muove.
Take DSpace. (Please.) To my certain knowledge there are at least four DSpace hacks to create embargoing out there. One (the Minho hack, and for future reference, Yankee devs: that’s pronounced MEE-nyoh or MEE-nyoo, depending on whether you prefer to sound Portuguese or Brazilian) is available publicly in some kind of documented fashion. All were developed wholly independently of each other. None is on track to be put back into DSpace. And those are just the ones I know about; I’d be shocked if there weren’t several I haven’t heard of. There is clearly an itch here. DSpace refuses to scratch it—so we get a lot of useless flailing about, and multiply-redundant efforts that could unquestionably be better-employed. We don’t have so much development talent that we can waste it solving the same damn problems over and over again!
So. What do we do about this knot of snakes?
The strategic-minded development community will do whatever it can to protect skunkworks library developers. I find that a carrot-stick approach consisting of public accolades and the delicate threat of public embarrassment for library administrators is the best shield: sending kind letters of thanks and support on official Foundation letterhead to every single code contributor with a separate copy to his or her boss, coupled with public acknowledgement of all contributors on a website, is a cheap, easy aid. (Any library administrator who reams out the dev over this, or forbids him/her to continue developing, is risking opprobrium in the wider community, and believe me, they’ll get that right away and won’t do it. A project near and dear to my heart at MPOW has been protected from summary execution in more or less this fashion.)
The next avenue of attack is through library vendors, paid developers, and the campus IT units who do most local hacking (since librarians, as I may have mentioned before, are mostly not coders). The strategic-minded development community courts these people like royalty, and does everything possible to ensure that their code gets thrown over the wall instead of hoarded locally. All three open-source repository packages have severe problems with this, for somewhat differing reasons—although none of the three, it is worth noting, has good procedures in place for vetting and adopting third-party code. In DSpace’s case, the biggest problem has historically been MIT’s Not Invented Here-ism, which notably caused certain of them to brush off (in a fairly nasty manner, I might add) the developers of NSpace (PDF), which was a visionary proposal that the current DSpace developer group is only now starting to catch up to nearly three years later.
And where are BioMed Central’s Open Repository developers? I bet you they’ve got some damn interesting code.
Sure, a lot of the code that gets thrown over the wall will be the kind of crap that I produce. Here I invoke Open Source 101: even crap code is better than no code, because crap code can be improved, and it’s a significant hint where the scratch-needing itches are.
(I mean, DSpace knows what it needs, if it cares to look. All anybody has to do is go through the tech list archives. ETDs. File-less submissions. Embargoes. Real dark archiving, not the half-assed kind where your metadata is flapping in the breeze via OAI-PMH. Language-switching interfaces. Statistics that don’t suck. Deposit interfaces and workflows that don’t suck. None of this is news!)
Campus IT is a bit of a harder nut, because they don’t necessarily feel any solidarity with the software package in the way that the librarians do. But developers speak developers’ language. Contact can be made, and should be.
So here endeth Libraries for Developers 101. There’s more to the story, but just what I’ve said here could in my opinion make a substantial difference to the viability of open-source software in libraries if taken seriously.