8 Martii 2006

What is an IR for?

Arthur Sale’s risk assessment for institutional repositories is every bit as good as everyone says it is. Should be in every repository-rat’s documents drawer.

In it, however, we find repeated the assertion that an IR should limit itself strictly to the peer-reviewed research literature of its target population. I still think that’s deeply wrong, but it’s up to me to defend my belief.

The cited concern is cost. Further details are sketchy, but the general idea seems to be that doing “digital-library stuff,” whatever that is, requires a lot of technical jiggery-pokery that costs a lot of money, and loading that into an IR’s budget makes the IR look cost-ineffective, which creates the impression that OA is cost-ineffective.

I am completely in sympathy with the notion that IR technology is limited, as it happens. IRs in their current state of technological advance are simply wrong for a lot of digital-library purposes. The Center for History and New Media won’t touch the IR I run (though I’m still after some of their older web projects, and when I finish one project I just started, I’m going to bother them again, because then I’ll know the ins and outs of capturing websites in DSpace), because the IR is too passive about capturing data and too rigid about presenting it. They’re not wrong to ignore me, though I could wish they were.

To put it briefly: if what you want is Greenstone, don’t use DSpace. One consortium MPOW is a part of has hacked the living daylights out of a DSpace installation to use it as a preservation back end for much more patron-friendly digital library applications, and their agony is unparalleled.

Still, it does not follow that an IR is intrinsically poorly-suited to every conceivable digital-library need beyond archiving peer-reviewed research. To be a good fit with an IR, a project should consist of individual, self-sufficient pieces of work that don’t really need to be seen next to each other or manipulated during viewing by the patron. Backfiles from a library-sponsored law review, good. A collection of interrelated photographs, bad—an IR simply won’t give patrons the helpful user-interface or sophisticated searching that they need.

Also consider the functions of an IR: capture/workflow, preservation, and expanded dissemination via OAI-compliant metadata. For any project that doesn’t need all three of those, an IR is not appropriate technology. I got a call yesterday from a colleague asking if the IR I run could be used for organizational records. I very gently said no, because those records don’t need extra dissemination, just capture and preservation. They need a knowledge management system or CMS, something like the new Docebo, not DSpace.

I will flatly assert that using an IR for records management is a strange and unwieldy idea, the more so because a substantial concern in records management is when to destroy outdated or sensitive materials, whereas DSpace the Roach Motel purely hates to let go of anything it ingests. Apparently it’s being done, though; I’m not sure why. The only records-management items I think are worth keeping in an IR are those related to the IR itself—sort of a self-documentation effort, if you will.

(I am probably persuadable that after the normal records winnowing has happened, remaining items are of sufficient historical interest for archival. But we don’t have digital records of this type old enough to qualify, in my opinion.)

So that’s where I am on the library-projects angle. To me it seems absurd and arrogant to forbid a library that’s undertaken an IR project to use it for purposes that otherwise make sense but don’t consist of peer-reviewed literature. Some digitization projects. Some preservation projects. Some dark archives, if OAI is turned off. Whatever works—but the tech does need to work for the intended purpose more or less out-of-the-box, because DSpace is a right beast to customize after a certain point.

(Fedora is another story, yes, I know. But anyone who balks at customizing DSpace shouldn’t even be considering a Fedora installation.)

As for grey literature, I think any IR that forbids it is throwing a huge social-engineering opportunity away for scant reason. To a computer, a grey-lit PDF looks just like a peer-reviewed PDF, so there’s no technological barrier to stashing it in an IR. And when it’s already miserably difficult to get an IR noticed, the last thing a repository-rat wants is to say no to the first few items a faculty member asks to include. We want them to use the technology. We want them to find it useful. We want them to make it an ordinary part of the work they do, all the work they do. Without low barriers to entry and a generous acceptance policy, none of this will occur, barring mandates that we can’t create out of thin air.

(There are five universities in the entire world with mandates. FIVE. I like mandates, but the idea of a mandate does not help me right now. MPOW is, I estimate conservatively, five to ten years from a mandate if the outside world continues on its present course. Nor do I hold with those who want to limit IRs to peer-reviewed research for the snobbiness of it all. Anyone who holds back from an IR over snob factor probably wouldn’t deposit anyway, because OA typically increases snob—I mean, impact factor for a given article, irrespective of whatever other chaff is in the repository with that article.)

I’ve taken a few things into the IR I run that are frankly borderline. (Not so much the grey lit, which I welcome for reasons discussed in this article, but other things.) I call it a sacrifice in the name of outreach—with luck, those same people will come to me later with things I want. I haven’t abandoned OA when I make this particular sacrifice; I’m just taking a slightly underhanded route toward it.

This doesn’t strike me as a huge sacrifice. It doesn’t hurt the server to hold onto a few borderline entries. It doesn’t hurt me to spend a minute or two on their metadata. (I can do a DSpace entry, soup to nuts, in under a minute if the files aren’t too terribly huge and I don’t have to think hard about keywords. Somebody should sponsor a race…) It doesn’t hurt the university to showcase other things as well as peer-reviewed research—far from it!

Because the alternative—I speak frankly—is an empty repository. It’s dead simple to set up an empty repository. A lot of people have. An empty repository strikes me as far more likely to be accused of misallocation of resources, fold, and threaten OA by folding, than a repository that has made itself useful in other ways besides holding on to peer-reviewed research.

What I hope to see, frankly, is a meet-in-the-middle between IR software and some digital-library software. The diglib software people have tricks our faculty would love (as well as tricks I would love!), and IRs have preservation talents that diglib software needs. Right now, all the experiments are Frankenstein’s-monster grafts like the one I mentioned above, but I do believe that in five to ten years, we will see more convergence. Interesting times—but the only way we get there is by enduring the current grim times long enough. Which means we can’t—absolutely cannot—sit around with our IR doors barred to everything but peer-reviewed research while we wait for mandates that may never come.

I’ve said before that I expect some IRs to fold in the next few years. If that happens, it should provide a test of my hypotheses. I’ll be watching; whether I turn out to be right or wrong, there’s a star D-Lib article in it.