I’ve gotten a couple of interesting responses to my note on fixing repository ingest processes, which I will pass on without attribution (though I’d rather credit, and I hope people will email again to let me).
One university’s repository manager Sarah Shreeves of UIUC reported to me that they are thinking along very similar lines, which is both pleasant validation and hope for the future.
Another DSpace repository manager let me know that he had implemented a system very similar to my second bucket. Faculty are presented with a web form consisting of an upload button and a place to paste a citation. At this point, control goes to the librarian, who evaluates the submission for appropriateness, fixes the metadata, and sends everything to the DSpace batch importer. The developer of this system looked at incorporating some kind of citation parser to further automate metadata creation, but (as I can attest!) writing one of those to handle a wide variety of citation styles is work.
(Like MARC, citation styles are an example of analog habits not friendly to digital environments’ needs. Somebody emailed me about APA style, which insists on initials rather than full names. Right. Example of print-culture need for shorter citations in conflict with the digital world’s need for better authority control. I can’t fix that. APA needs to.)
Two things interest me about the system described. One, that DSpace is so brain-dead rigid that the developer had to go outside the system altogether to innovate. I hope with all my heart that I am not the only person who sees this rigidity as a significant threat to repository platforms in general (because how much better are EPrints and Fedora?) and DSpace in particular.
I’ll make it simple: either repository platforms do what repository managers need them to, or they die. There are days I’m ready to spork DSpace to death myself. Nota bene: repository managers, not repository developers. “You can hack it in!” is not an acceptable response to this problem; nor is “you can do it as long as you have administrator access to the server and you change this arcane configuration file buried three layers deep, then run this little script from the bin folder, then restart the system.”
The other thing that interests me, and is somewhat more hopeful, is that I think the above scenario will be very close to implementable in DSpace 1.5, which will integrate Tim Donohue’s Configurable Submission. Cut down the submission forms to one (or one plus licensing if you must), and dump the submission into the workflow process for a librarian to clean up. The big hurdle I envision is that DSpace expects certain pieces of metadata to be in place before it will consider a submission complete enough to put into workflow. I don’t know what (if anything) Configurable Submission does about that… but hacking DSpace to let it accept on-the-cheap submissions seems like a soluble problem.
Another correspondent asked me about searching (for example) ISI Web of Science for author affiliations, pulling down citations, and feeding appropriate content into the repository that way. This is part of what my esteemed colleague Eric Larson is doing with his BibApp project, with collaborators from the University of Illinois at Urbana-Champaign, and I think (with some reservations, to be discussed in a moment) that the BibApp represents one of the few viable futures for institutional-repository content recruitment from the peer-reviewed literature and discipline-based collections of gray literature (such as disciplinary repositories).
Not all is rosy, however. The biggest stumbling block is rights. Just now I happen to have BibApp-collected content from IEEE and APS, because they allow archiving of the final PDF. For publishers who do not allow this, which is most of them, the most the BibApp can do is try to semi-automate the process of nudging faculty to provide preprints and postprints. (In fact, the BibApp isn’t there yet, and my sense is that there would have to be a lot of high-level discussions with department and school administrators before it could reasonably go there. As yet, I don’t think the political will exists at MPOW to do this work.)
The rights issue does not stop at publishers; some disciplinary repositories that would otherwise present tempting harvest targets look to be off-limits. I am particularly concerned about SSRN, which is aggressively expanding into the humanities; Peter Suber briefly mentions some of the problems there. I hope that some of the leading lights of open access will take this on as a policy issue—there isn’t much point in moving from greedy publishers who embargo access to greedy repositories who prohibit data replication (which is, let us all remember, a key element of any responsible preservation strategy).
Another stumbling block, as with many repository initiatives, is the sheer amount of work it takes to build the appropriate searches in the appropriate databases, weed through the resulting citation lists, match them up with PDFs, and get it all ready to import; this is especially a problem in the context of backfiles. (Ongoing work, we have learned in our experimentation with the BibApp, is actually fairly manageable, especially when semi-automated via RSS feeds of particular searches.) Since so many library administrators are stuck in the abundantly-discredited “build it and they will come” mode of thought around institutional repositories, acquiring staff and resources for such a content-recruitment initiative is a Sisyphean struggle.
At MPOW, aside from our BibApp pilot initiatives, the demarcation line is stark and simple: If I can accomplish something by myself, or with the occasional and strictly time-limited assistance of our most excellent repository sysadmin (who wears several other sysadminly and developer hats), it can be done. If I need any kind of help, whether it be dedicated staff time, specialized equipment or software, or financial resources, it can’t. Reread this post with that in mind, and you begin to understand the perennial repository content-recruitment problem. There’s only so much I have the skills, time, and political influence to do!
Moreover, I am not alone in this. The situation at MfPOW was essentially the same, and honestly, many repository managers have even less to work with than I do: they only work on the repository part-time, or they are constrained by IT restrictions, or they don’t have even the basic hacking and web-design skills that I have. Again, active rather than passive content recruitment for institutional repositories is a library-policy issue that leading open-access advocates could make inroads on if they chose to, and I wish they would.
The last phenomenon I want to call everyone’s attention to is the hiddenness of the conversations I am recounting. I didn’t know that someone had put together a bucket implementation around DSpace. I didn’t know that other people shared my thinking about streamlined deposit procedures. My correspondent who spoke of ISI didn’t know about the BibApp. And none of the people who wrote to me know about each other!
This is wrong. This is pernicious. This is pluralistic ignorance at its worst. Repository managers need a community of practice and we need it badly, because as sui generis workers, we do not get appropriate support and knowledge-sharing opportunities, not from the institutions in which we work and not from our professional organizations.
Why is there no such community already? Partly, we are fractured across software lines; since our first needs as repository managers are typically technical needs, we naturally gravitate toward software-specific mailing lists. I know a fair few fellow DSpace managers, because those are the lists I live on. Les Carr and I email each other occasionally; that’s all the contact I have with the EPrints landscape, and I don’t know anybody running a Fedora repository (now that Leslie’s moved on)!
Partly it’s that not all of us are dedicated wholly to this work. I don’t know what fraction of the repository community represents full-timers like me as opposed to those who have repository duties loaded onto an existing full-time workload, but my sense is that most of us are part-timers, and by a long shot. It’s hard to build a community of practice out of part-timers. Many part-timers, in my experience, don’t have sufficient current awareness to know about the conferences, journal issues, and blogs that already exist!
Partly, it’s that our presence in the normal knowledge-sharing environment of librarianship is fractured and scattershot. We don’t have a journal; we only have “special issues” of umpteen different journals. We don’t have conferences we can reliably attend. DASER is dead, as best I can tell. Open Repositories travels all over the world, which is more than I can afford to do. Although ASIST contained a lot of repository-relevant content this year (did anybody else notice that both the dissertation proposal and paper of the year were open-access–related?), it’s a research conference, not a practitioners’ conference.
And partly it’s that we repository managers (with help from the dismissive and the just plain clueless) have internalized the general failure of the “build it and they will come” ideology. We think it’s our fault our repositories languish empty and we haven’t changed the world. So we aren’t looking to ourselves or to each other to move the discussion forward.
We’re looking to the research literature and to our professional organizations, and both of them are letting us down. The research literature is full of useless quantitative investigations that do not tell us what we as repository managers can do to improve matters. Such qualitative investigations as exist interrogate the experience of faculty, not repository managers—again, one way we might talk to each other (if a rather indirect and roundabout way) has been foreclosed on.
Our professional organizations are not helping us talk to each other either. Instead, they’re getting Big Picture Thinkers to talk at us rather than to us, and heaven forbid said Thinkers should listen for five minutes! Frankly, the Big Picture Thinkers (including, I regret to say, the otherwise excellent Cliff “institutional repositories are essential infrastructure” Lynch) are responsible for “build it and they will come” in the first place. They need to sit down, shut up, and let those of us talk who have lived out the consequences of the failure of their Big Picture Thinking. We’re not insisting on that, mind you, because of our internalized sense of failure; we still believe the Big Picture Thinkers can haul us out of the morass, so we listen quietly and hope.
I don’t have a good answer to this problem, not for lack of stabbing in the dark. I helped try to start a journal (though I didn’t help enough, not by a long shot). It didn’t fly (and I own a lot of the blame for that). I tried going the Library 1.0/2.0 route with a listserv and bulletin board. I mercy-slew them a couple of months ago. The chief blog around which a repository-manager community might coalesce—Open Access News—is a link-and-commentary blog that doesn’t allow reader comments. I think NISO/PALINET inviting me to speak at their workshop is a good step, but recollect for a moment that I’ve been doing this for two and a half years and this is the first invitation I’ve gotten to speak directly about what I do—and think about all the insightful, pragmatic, innovative repository managers who aren’t getting invited anywhere because they’re not loud and obnoxious the way I am.
The best we may be able to do at this juncture is to have one of our sadly few conferences try to jumpstart an online community. Open Repositories, are you listening? Because the current sad situation has got to change.