‘Open Access’ Archive

14 Octobris 2008

My Father the Anthropologist; or, What I Offer Open Access and Why

In 1980 or thereabouts—I was eight or nine—my father the anthropologist started yet another rant about serials cancellations at his university’s library while he drove the family somewhere in the family car. He thought the problem an artifact of library underfunding, I remember. I don’t recall that he ever did anything about it save rail bitterly on the subject to us, his captive, powerless, and resentful audience.


At the inaugural meeting of the Open eBook Forum in 2000, David Ornstein and Janina Sajka explained what they hoped electronic books would accomplish. Amid the faux-visionary fluff and the crass dollar signs, one hope they expressed made me vibrate: that for the first time, a visually-impaired person would be able to walk into Borders or Barnes & Noble and buy a book off the shelf just like anyone else.

Access to human knowledge and creativity. Access for the wrongly disenfranchised. Access. I loved markup, I loved text, I loved design, I loved standards work—but then and afterward, it was the access argument that kept me engaged with electronic books. My father the anthropologist, his own eyes not what they had been, understood and endorsed that argument at once.


I certainly know how reassuring accurate, authoritative medical information can be. When my father the anthropologist went to the hospital for bypass surgery, I looked for every scrap of reliable information I could find about what he’d have to go through, what his chances were, what would happen afterwards. Information is hope for helpless bystanders.

I know what information gaps mean to the efficacy of medical care, too. I started my quest to treat my repetitive stress injury when my hands and wrists hurt so badly I couldn’t sleep some nights, nor survive a day’s work without severe pain. The open web, obvious misinformation aside, contained little more than nonsensical and insulting condemnations of RSI sufferers as malingerers, as well as blatant advertising of invasive surgery on the websites of orthopedic surgeons.

My primary-care physician insisted on old-fashioned treatment modalities before she would refer me anywhere. I paid for and endured weeks of wrist braces that I knew would not relieve my pain because I had tried them, as well as a tennis-elbow strap that left me in such agony that I refused to put up with it longer than a day. I did achieve a referral at last, and physical therapy turned out to be the right treatment. As I healed, the new search skills I was acquiring in library school, along with the access that being a student entitled me to, helped me discover that the medical literature understood why my doctor’s initial recommendations had been wrong. Why did I waste time, money, and pain over my inability to produce reliable information to assist my medical provider in treating me appropriately?

I can only be glad I wasn’t suffering from anything life-threatening, like artery blockage.


I was slotted into an online course in “Virtual Collection Development,” taught with patient lucidity by Jane Pearlmutter, my first semester in library school. Among the readings was “The Librarians’ Dilemma: Contemplating the Costs of the ‘Big Deal’” by the University of Wisconsin’s own Ken Frazier. There it was again, this problem of serials cancellations, framed in terms so transparently sensible that I could only exult.

Later in the semester came a unit on open access. It would be nice to say that lightning struck and I knew that was what I wanted to do with my professional life, but it didn’t and I didn’t. Of course I was intrigued; I knew several for-profit journal publishers from the worm’s-eye view of an erstwhile lowly data-conversion peasant. I wove the complaints I remembered from my father the anthropologist, my own experience in scholarly publishing, and what I learned in class into a rich, detailed mental tapestry, and I felt real hope that open access was an answer I could take back to him that he would understand and appreciate. Discovering that I would shortly join the profession backing open access only confirmed that library school was the right choice for me, even should I not work in the open-access niche myself.


When I landed my first library position just after graduating, I called my father the anthropologist. His first question was “How much will you be paid?” I declined answering. His second question was “What’s your title?”

“Digital Repository Services Librarian,” I said, with pride and no little amusement.

On the other end of the line, a lengthy silence.


My father the anthropologist used to buy lab equipment out of his own pocket, rather than struggle with byzantine university purchasing procedures and skeptical departmental scrutiny. Rightly or wrongly, he was convinced no one would understand or support him and his work, but he refused to knuckle under. He would do what it took, spend what he had to, to further the research he fervently believed in.

I have bought quite a bit out of my own pocket too, rather than charge it to the libraries that have employed me. I have bought color inkjet printers, various sorts of expensive paper for brochures and bookmarks and whatnot, and poster printing. I have bought software that I use for work-related purposes. Once I bought an expensive print run of a color brochure because an opportunity came up to distribute a lot at once so suddenly that I didn’t have time to print and fold them myself as I usually did. I bought a cross-country trip to an important repository conference when I was de facto between jobs. I bought a laptop on which I do repository-related work when the occasion warrants. I have bought buttons with images of Mars on them, because when you’re handed a golden acronym you might as well make the most of it. Like as not the libraries I have worked in would have paid for some or all of this—I never asked.

I have read, written, rewritten, commented, and debugged code in Java, Python, and XSLT. I have tweaked JSPs, murdered unnecessary HTML tables, and rewritten CSS designs from the ground up, swearing sulfurously at various versions of Internet Explorer. I have edited metadata in XML by hand. I have translated Endnote records into Dublin Core. I have screenscraped ugly HTML and cudgeled it into legible metadata. I have screenscraped yet more ugly HTML for transformation into preservation-worthy markup. I have built convoluted SQL queries slowly and carefully from the inside out, run them on production databases with fear and trepidation, and once or twice cleaned up after them when I’ve gotten them wrong. I have typed cargo-cult incantations at command lines to keep server software running and upgraded, and raked Google for answers when some incantations didn’t work as promised.

I have stared at lengthy CVs with a sigh, and then waded resolutely in to clear rights on as many of the publications as I could. I have searched SHERPA/RoMEO and Bowker’s Books in Print. I have hunted down agreements from publisher websites. I have asked faculty for their copyright-transfer-agreement files, and tried not to let my smile grow too pained when they told me they don’t keep such things. I have explained the difference between preprints, postprints, and publisher PDFs to politely incredulous auditors. I have read scads of legalese, and interpreted it as best I could. I have read and pondered the words of librarians and lawyers who understand the legal fine points much better than I. I have made some risky calls, likely some wrong ones. I haven’t been called on the carpet for them… yet.

I have held one-on-one meetings and demo sessions with faculty and librarians. I have designed and produced brochures, flyers, slideshows, posters, web pages, wiki pages, and one mini-movie. I have presented at innumerable campus expos, showcases, lectures, symposia, conferences, and workshops. I have called and written my elected representatives. I have blogged. I have written articles and self-archived them, sometimes after polite and fruitful discussions with publishers. I have run any number of failed efforts toward building a community of practice among repository managers, each new attempt the triumph of hope over experience. I have cold-called librarians, faculty, department chairs, deans, and administrators. I have been to more meetings than ought to fit in the three years I’ve been doing this.

You needn’t be obsessed like my father the anthropologist and me. Believe me, that’s the last thing I’d recommend to anyone. If you cannot find even one thing you can do in the above list, though, I wonder about you.


I once explained to a pleasant elderly faculty member that the repository didn’t easily allow changes. “It’s like a roach motel,” I said. “Files go in, but they don’t go out. Once they’re there, they’re stuck.” Suppressed chuckles from librarians in nearby cubicles greeted that statement, and I returned from ushering the faculty member out to find that my colleagues had good-humoredly dubbed me the Innkeeper at the Roach Motel.

I loved the sobriquet, despite the unhappy truth of its depiction of institutional repositories. I have never liked telling faculty members that my services couldn’t do what they needed, and I’ve had to tell them that often and often. Worst of all, I couldn’t envision my services as anything my father the anthropologist would find useful, compelling, or even comprehensible; the promise of green open access was fading fast in the unforgiving floodlights of faculty diffidence. I looked around the open-access community for understanding and a path forward, but I found little to help or reassure me.

My father the anthropologist and I are alike in one way at least: we don’t suffer fruitless systems in silence. In one way at least, we are different: I cannot content myself with complaining to the powerless and uninvolved.

I don’t think there’s a community I operate in that my gadfly ways haven’t irked or even alienated. My library school. My librarian colleagues. DSpace developers. Green open access. Library bloggers. The DSpace Foundation. Library coders. Repository managers. The open-access community in general. While I accept all this as the price gadflies pay for being pests, it is no source of pride, nor is it pleasant. I have feared for my job, and like as not I deserve to. I have feared that the career I find myself in will not exist in five years’ time, and I have wondered uneasily whether my own behavior has hastened rather than forestalled that eventuality. I have been cautioned, questioned, belittled, berated, cut down to size in public, stepped cautiously away from, set up as homo stramineus, misquoted, deliberately or carelessly ignored—and much of it I have richly earned.

I have also been heeded. I have also made change. Not much, perhaps; certainly not all the change I wanted to make, wanted to show my father the anthropologist, wanted to offer the world. Even so, change is my gift to them and to you: my gift I offer in my much-abused hands on this Open Access Day.


Rodin, La Cathedrale

Rodin, La Cathedrale.
Photo by Wallace Grobetz, via Flickr and the Creative Commons.

13 Octobris 2008

Bell making ready to toll?

The beginning of the end for the institutional repository? You tell me.

I have to prepare an internal study about the usage of the documents available into Archimer, our Institutional repository, to justify the work we do on it.

We all know what I think, right? I even brought that opinion up-to-date recently.

I confess I’m a little surprised that this request didn’t come from a US repository. I’m also surprised that they seem to be focusing on downloads rather than uploads, so to speak, although I have the uncomfortable feeling that downloads may be all the ground they feel they can defend, and it’s crumbling under them.

I’m not surprised at all to see an IR forced to justify its existence. I’ll be utterly astonished if Archimer is the only one in the next year or three.

Go ahead and click over. The money quote I gave above isn’t the meat of the email. I’m still trying to work through the implications in my own head.

22 Septembris 2008

A, B, and C

Required reading for repository-rats and all who love them: Palmer et al.’s investigation into institutional-repository methods and results. Given how rarely I praise research in this area, not to mention how often I complain bitterly about it, I hope my unalloyed praise for this report holds weight. It’s well-written, it’s well-supported, and it’s right in all the important ways. Like Margaret Henty’s article, which I have also had occasion to praise, it’s useful; I learned things I hadn’t known but have no trouble believing from it, and I’m an old dog as this field goes.

If you’re in the business, you can figure out pretty quickly who at least two of the three studied institutions are. (I’m still a little fuzzy on A, though I have a strong suspicion, but I know beyond a doubt who B and C are.) None of them, in case anyone is wondering, is MPOW, so I’m not feathering my own nest here.

Money quotes:

In general, the basic aims of universities in investing in IRs—to collect, preserve, and provide access to their research output—seem misleadingly simplistic compared to what IRs are actually attempting to accomplish, and what they will need to do to identify and successfully implement functions that are not redundant or risky and of high value to faculty.

This is exceedingly well-phrased, and it gives me to ponder somewhat about how I characterized the tension between repository-rats and other librarians (including but not limited to library administrators) in Roach Motel. Faced with a “basic aim” that is impossible to accomplish, repository-rats naturally nose about for other problems to solve (and the report makes that strategy quite clear, addressing its benefits and drawbacks even-handedly). I think I have traduced my ratly colleagues and myself in Roach Motel by expressing this process purely in terms of nervous rats seeking job security and self-justification, and I’m sorry for that.

The truth is, I want to be useful. We all do, all of us rats, even if not everyone is exactly like me in usefulness being a fundamental work drive, what gets me out of bed in the morning. If we can’t be useful in IRs’ “basic aim,” and often we can’t for reasons well outside our control (this being a major theme of Roach Motel), we actively look for other problems, do our best to make ourselves useful in other ways. These problems fall almost exclusively outside IRs’ supposed “basic aim,” which naturally confuses other librarians.

The intellectual property (IP) obstacles involved in populating IRs consumed significant amounts of time and resources and can be a drain on other core development activities.

No argument here. IP is a swamp, and it’s not a swamp that most IR planning processes anticipated. The report’s discussion of how faculty and IR staff build boardwalks through the swamp is trenchant and well worth reading.

Unlike other aspects of repository building, liaison networks with faculty were already a functioning part of library operations and are now serving as essential human infrastructure in IR development.

While the subject orientation of liaisons is being exploited in IR development, there seems to be much less application of their experience in collection development, management, and evaluation—areas of expertise that are highly relevant but need to be revised for the IR collection model.

Liaison librarians are essential to a well-functioning IR, and their essential-ness is most of why the maverick-manager and no-accountability staffing models are often anti-patterns. I didn’t make this clear in Roach Motel, and I now think that was another goof-up on my part. The key, as I hope I did make clear, is library administrators setting clear and realistic goals related to the IR for all their staff: repository-rats, liaisons, cataloguers, and others alike.

I tend to be a little bit more of the traditional librarian, because I don’t know TEI, and I don’t know SHTML. [I suspect that should have been 'XHTML,' and that the error was in transcription rather than originating from the librarian interviewed.] I don’t know XML. But, it’s pushed me to try to understand that a little bit better. … But what I see happening is … and actually over at the library itself, is this beautiful combination of understanding the structure of information, and understanding the code that goes behind it, and how to make it usable to the people who want to access it.

Liaison 15, whoever you are, I salute you as a valued and respected colleague! I will be quoting you to my LIS 644 students, because you are an exemplary librarian. If we ever turn up at the same conference, please introduce yourself; the drinks are on me.

Perhaps most important to the viability of IRs, however, were those [faculty] who found that the IR solved a particular information problem they faced in the everyday practice of scholarship.

I said something quite like this pretty bluntly in Roach Motel. I’m pleased to see it supported, because I could only assert it, not back it up.

Digitization was seen as a productive correlate service.

I said that, too, and I stand by it. The analog-digital divide is not something I made up. The tension comes in, I think, because digital librarianship’s usual careful, meticulous digitization and description methods cannot function here; there’s just too much material. Archivists’ “more product less process” epiphany may well be the way forward.

Depositors and liaisons alike commented on how many faculty members could not differentiate between open access scholarship and scholarship that was available through the library.

Open-access movement, this is to your address, I think. You haven’t made that nearly clear enough, and it’s a problem. What did I say once? Oh yes, this, in the context of e-reserves quarrelling: “We have to draw a thick black line connecting what faculty do and what they have access to, because right now they don’t see it.”

I can’t pull quotes from the faculty members, because everything the report quotes from them and about them is so good and so right and so real. I’ve had all those conversations before, every last one of them.

Policy and criteria-based selection and evaluation are not typical. Instead, developers have been quick to capture collections not encumbered by copyright constraints, offering access to a growing base of local technical reports, grey literature, and theses and dissertations.

This squares with my experience, and is a logical outgrowth of “basic aim” failure combined with the IP swamp. The only thing I can add is that I believe it would take a heavy load off many repository-rats’ minds if realistic selection criteria and priorities could be made explicit, such that in pulling together local tech reports, grey-lit, and ETDs (not to mention datasets), we’re confidently fulfilling our mandate instead of cautiously creeping outside it wondering what will happen to us when we get caught. Another positive outcome would be a realistic reassessment of just how much work it takes to capture peer-reviewed material legally, and resource provision to match.

By the way, any resemblance of the title of this post to an excellent episode of The Prisoner is purely intentional, ’cuz I’m just too much of a geek for that not to tickle my funny-bone (… connected to the…).

19 Septembris 2008

Adopt a publisher

I am not talking like a pirate at you today. In return for this courtesy, I would like a small favor.

There is language rattling around in Congress that would destroy the NIH Public Access Policy. The actual bill introduced by Conyers is probably moribund if not dead. The concern now, as I understand matters, is that the anti-NIH language could be snuck into another bill.

The Open Access Directory is doing its part in a way that will help us all, no matter what it accomplishes in the Congressional wrangle. OAD has a page of publisher policies vis-a-vis the NIH Public Access Policy, and they are asking us all to investigate a publisher (one with “No known policy as of…”) and update the page.

Robin Peek asked me to publicize this effort, which I am most happy to do.

16 Septembris 2008

Personas and boxes

A friend of mine dropped an email to say that I should have been cited in this examination of IR-related Cooperesque personas. Oh, please, who cites blogs in stuffy old librarianship? I’m cool. Call it great (or at least thoughtful) minds thinking alike.

The money quote from that article is this:

It was assumed that the users desired an open-access archive of primarily published research materials generated by the faculty and graduate students, but the users actually desired a network where teaching and learning materials are shared, potential collaborators are identified, and participants’ research is promoted to institutional colleagues.

It was assumed. “Mistakes were made.” Mm-hm. They didn’t need to cite my personas. It wouldn’t have hurt them to cite Roach Motel on the subject of faulty ideology, or faculty not using something that has no value to them, however. They get a pass, though, because Roach Motel is still only out in preprint.

The article is worth reading in its entirety. They did the work I didn’t and couldn’t, pulling together enough user interviews to base their personas on something other than instinct and anecdote, and to their everlasting credit, they didn’t flinch away from conclusions that are not encouraging for IRs as they are designed and run today. The chief problem with the article is that none of their personas is a librarian. It’s impossible to understand the situation of IRs without the librarians who authorize, plan, build, and run them. Doing so leaves you with “it was assumed.” Assumed by whom, pray, and why? And to put a Harnadian spin on the matter, if we build faculty a whizbang collaboration space that doesn’t actually make any literature open access, is what we’re building really an IR? Will it achieve what we (we librarians, remember us?) wanted to achieve in the first place?

Anyway. Read, ponder, learn.

Also click over to Mark Leggott’s Repository in a Box. Built atop Fedora (a point I will return to shortly), this is a mashup with Drupal that faces head-on the reality that them that has (content) gets (content). Leggott has built a system that gathers citation data, freeing faculty of the need to enter it themselves and giving them incentive to correct and augment it.

I’m dubious about the strength of that incentive, personally, given the English experience with the Research Assessment Exercise. Les Carr can opine more fruitfully than I on that subject. However, any incentive is better than none!

The technical underpinnings of this work read as pretty solid to me. The one link I’m mildly dubious about is going straight to FOXML from RefWorks; on principle, I would want to go through SWORD, but sometimes pragmatism trumps principle—SWORD isn’t completely baked yet. I look forward to the release of this software, because I’m enough of a Drupalista to be able to get along with it, and I’m just starting to learn a bit about Fedora.

I get the sense sometimes that the decision to run an IR on DSpace is, in the United States at least, a variant on “Nobody ever got fired for choosing Microsoft.” Of the three open-source repository packages, it is the most demanding on hardware and (ironically) the hardest to install and get running. (I got Fedora running on my desktop Mac at work in fifteen minutes. Seriously. Try that with DSpace, I double-dog dare you.) Compared with Fedora, DSpace is rigid and all but impossible to stack other technologies on, as Mark Leggott has done with Drupal. Compared with EPrints, DSpace is an unusable mess, particularly on the back end.

(If you sit Chris Gutteridge down with a beer, as I was able to do in Edinburgh, he will happily tell you that he revamped the EPrints deposit system for usability after trying to deposit something in an EPrints repository and being appalled at the number of clicks and keystrokes it took. He did a good job of it, too. On my more evilminded days, I have wild daydreams of forcing the entire DSpace development inner circle to screenscrape back issues of a journal or newsletter and then deposit every single last article through the DSpace web UI, one… by… one. Much would be learned, I believe.)

And if you’re even thinking about building the system that would satisfy the personas in the Maness et al. article—forget about building it over DSpace. Just forget it. Sheer madness. Fedora is the right choice, the only possible choice.

For the last month, I’ve been running an ad-hoc requirements-gathering process on the DSpace mailing lists and IRC channel. I’ve learned a few things from it. One is that getting librarians to speak up in a discussion even faintly technical is like pulling all your teeth at once. I am quite unhappy about this; never mind that it doesn’t speak well at all for my profession, it’s ludicrous to ask a passel of developers to read our minds. No wonder the ILS is in such a sad state (open-source aside). Relying on, or even hoping for, librarian input can be just plain deadly.

The other thing I’ve learned is that the DSpace development process is significantly underresourced given the state of the codebase and the needs of the stakeholders. I don’t have a quick fix for this (and as I must, I have mythical man-months in the back of my mind) or even a useful suggestion. I can only observe that it’s standing in the way of progress. I can gather all the requirements I want—and despite my grousing, I think I have gathered quite a bit of useful input in the last month—but it don’t mean a thing if none of it can get built, and I’m currently hearing a lot of “we can’t build this; we’re volunteers” from the developers.

As always, caveat lector. Caveat emptor as well. If I were Harvard especially, I’d be looking really really hard at Mark Leggott’s mashup, because it goes a long way toward nipping a potentially damaging faculty backlash (against extra work) in the bud. Try that on top of DSpace. Even with SWORD, which at least makes something like that possible, it’s a tall order.

In all honesty—I’m having a much easier time of it learning how Fedora ticks than I ever had learning DSpace. Partly that’s because I’m an old unreconstructed markup geek, so little XML files hold few terrors (and FOXML is actually pretty elegant, as these things go; it’s definitely nicer than METS), but partly it’s the effect of a sanely-designed system.

Anyway. That’s what’s caught my eye the last few days. Read, ponder, learn.

11 Septembris 2008

Contrast

I’m pretty open in my belief that Europe in general and the UK in particular are a goodly distance ahead of the US in taking repositories (and repository-rats) seriously and moving them forward. Two things that came across the transom today confirmed that impression.

The first was the JISC-sponsored Rights and Repositories workshop. I want to go to something like this. I’ve always had to deal with rights issues (beyond the ordinary rote stuff) ad-hoc and mostly unsupported. With ad-hoc problems, that mostly works, but I feel as though I’m tiptoeing through minefields. Just the validation would be nice! Note also that half the morning speakers are real repository-rats dealing with real problems in real repositories, and that the entire afternoon was repository-rats talking amongst themselves rather than Talking Heads (or worse, Big Thinkers) talking at them.

The second thing was the announcement of the SPARC IR meeting program. I will be going to this, because I can’t very well not, but I must confess I haven’t been entirely enthusiastic about it… and I’m still not. Except for “oh, hey, they got a speaker from DRIVER!” DRIVER, of course, is a European initiative.

They don’t have talk topics up yet, so maybe I’ll be blown out of the water by their brilliance and relevance. I hope so. As I look at the Program Committee, though, I see one name I recognize as belonging to a repository-rat. One. Even if the real number is double or triple that—that’s insufficient, and it’s disturbing, and it’s frankly insulting. Come on, SPARC, I expect better than this from you. It’s past time you figured out that some of us on the ground might actually know our needs better than your Big Thinkers.

What really makes me roll my eyes, though, is the “marketing practicum for repository advocates.” With, forsooth, not a single repository-rat on it. That’s silly, because how can you discourse learnedly about marketing something you don’t know anything about, much less have ever marketed? I fully expect this session to be a mishmash of thunderingly useless generalities and condescending head-patting.

The real problem, though, is that if this benighted country still hasn’t managed to figure out that marketing repositories is completely useless in the absence of a value proposition, well… we’re behind, that’s all there is to it. Behind the times and wallowing in it.

I wish JISC could cross the pond wielding a great big clue-by-four, I really do. I pledge that I will do my level best to be a nice, pleasant, politic repository-rat when I go to SPARC-IR, and I’m saying so publicly as an added incentive to keep my word. But I have a terrible, terrible feeling I’ll find it a strain.

10 Septembris 2008

What do we want from IRs, and what are we doing to repository rats?

Earlier this year I predicted that we would see an institutional repository shut down this year, or change so much as to be unrecognizable. It hasn’t happened, to the best of my knowledge, and on the whole, I think it won’t; not this year, at any rate. Harvard has a lot to do with that, of course, but that’s not the whole story either.

One thing I’ve seen—anecdotally, but enough anecdotes as to at least suggest data—is small, non-research-focused institutions talking seriously about starting IRs. When I inquire about content recruitment, I find that the people in charge of planning the IR have drunk the self-archiving Kool-Aid. They want their faculty’s peer-reviewed literature and they’re quite convinced, despite all the evidence (not to mention my blunt warnings; one such institution had Roach Motel shoved under their noses, and is apparently still going ahead), that faculty are going to flock to this thing to give it to them.

I don’t know what to do about this that I haven’t already done. I can only bury my face in my hands and hope that somebody these poor souls will listen to (since clearly that’s not me!) buys them a clue.

These anecdotes point to the real set of questions every single institution with an IR needs to be asking itself: What content do I want from this initiative, and what am I willing to do to get it? Spoiler for this post: if the full answer to the second question is “I’m willing to run and market an IR!” please don’t start one, because that is not enough to get whatever it is that you want, and you will waste precious library resources, your people not least.

We must refocus our planning away from IRs per se and toward specific content types and the resources we’re willing to throw at acquiring, presenting, and preserving them. (Bias alert: we’re in the process of doing exactly this at MPOW, and it’s been excellently healthy for us thus far.) Doing so puts a cold hard stop to many of the problems plaguing existing IR programs: the grab-all-you-can problem caused by nervous repository-rats struggling for repository growth however they can make it happen, the unfocused-effort and unfocused-marketing problems, the “what is this thing anyway?” problem, the lonely abandoned repo-rat problem, the problem of “The IR is the solution! Now find me a problem! Uh, not that problem; I can’t actually do anything about that problem…”

Take the open-accessing of peer-reviewed literature. (Please!) Let’s say that’s our goal. It often is, although that goal has been too often hidden behind the non-goal “let’s open an IR!” The idea that making a space to put this literature in is sufficient to ensure its acquisition is dadaist absurdity. We can open an empty library building, and we can market its existence all over creation, but the mere act of doing so won’t fill the shelves! (Or worse, it will fill them with the sort of junk that people feel like donating. Come on, librarians, we all know that most donated stuff is junk!) So with IRs. “Empty attics,” anyone?

(Now, that’s actually an interesting idea. Has the “content donation” paradigm inadvertently coupled the IR with Goodwill in faculty minds and hearts—a place for stuff you don’t actually care enough about to look after yourself? Ouch, that hurts. But it seems… not entirely implausible. Ouch again. Or is it ouch? Maybe it’s opportunity? I don’t know.)

Once we focus on the stuff we want instead of the place we’re going to put it, we open up the questions we should have been asking all along. How does this stuff get produced, and how could we help produce it in a way that keeps it available to us? What happens to it when it’s done? What incentives can we offer to have it given to us, and are those sufficient to counter any opposing incentives combined with natural inertia and the actual difficulty of the task? Failing that, how do we find out about the existence of the stuff we want, and how can we then get our hands on it in the form in which we need it?

And then, at last, we can ask ourselves the elephant-in-the-room question: given the effort we’ll have to put into getting what we’ve decided we want, do we still want to go after it? “No, it’s not worth it,” is a perfectly acceptable answer to that question, to my mind; every library has to work within resource constraints. What I don’t want to see any more of, and I still see it everywhere from the small institutions I mentioned above to immense Research Is and major consortia, is ignorance of and unwillingness to engage with the elephant in the room.

I hesitate to bring this up because it cuts closer to home than I usually like to go in CavLec, but it’s important enough and I’ve gone close enough to it already that I’ll take the risk: ignoring the elephant in the room damages repository-rats most. It puts us in the impossible situation of running an unsuccessful program whose nominal goals are unclear or unreachable, with poorly-targeted resources (if any) and limited freedom of action. As time passes, the problem only snowballs, because the program’s struggles reflect poorly on the rat, making it that much harder for her to argue successfully for change or for help.

I won’t discuss actual career damage, though I suspect it may be or become a serious problem, especially for maverick managers: think a bit, and then tell me the natural career trajectory of the Innkeeper at the Roach Motel. What’s objectively becoming evident, at least, is that the self-archiving movement has been murder on repository-rats. I pointed out in Roach Motel how hard it is to find and keep them. This is one of the few situations in which I think a research survey might do some good: who are the rats, where did they come from, and where have they gone?

I’ll tell a story that mostly isn’t about me, just by way of context. I was at a repository-related meeting once where a librarian from another institution (which was on the “no-accountability” IR staffing model, for those who have read Roach Motel) suddenly began to shed unwilling tears, because as much as she believed in and worked for the repository, she just wasn’t making any headway, and nobody at her institution seemed to understand why. She was frustrated and scared, because where, after all, was the low-uptake buck going to stop if not at her desk, since she was repository booster-in-chief? She needed attention, help, understanding, and support that she obviously wasn’t getting, and in that moment, it all became too much and she broke down.

I couldn’t do much for her. I think she wanted me to tell her that if her institution went to the maverick manager model (which is the staffing model both of the places I’ve worked settled on), everything would be hunkydory. I have never been able to say that and mean it—goodness knows my own inadequate performance as a maverick manager can’t justify it—and I’m no good at that kind of fib. All I had was cold, cold comfort: this is slow and frustrating, and all you can do is your best.

I’d like to say that was the only time I’ve personally witnessed such a breakdown, but it’s not… and if I were to pile up all the frustration I’ve seen that didn’t result in tears, it would be a veritable mountain. This phenomenon feeds my strident calls for a more cohesive and effective community of practice for repository managers—if our libraries and our movement aren’t going to hear and support us, we need to find that support somewhere, or we’ll burn out. I suspect without proof that many of us have already (and I am avoiding the obvious cliché with all my strength here; the ship may be listing, but it ain’t sunk yet).

If there’s a reason that I’ve become a one-woman institutional-repository harbinger of doom over the last year or two, that is it. It’s not even the situation IRs find themselves in, as difficult as that is; existing practice demonstrates that there are ways to solve IR problems, make IRs a valued institutional property, if the will exists. The real reason is that while the problems with IRs are finally being acknowledged and openly discussed (and that is a great link that you should all click over and read), what’s still missing from the discourse is the impact that IRs’ difficult situation has had on talented, dedicated people, such as the librarian who broke down at that meeting.

Follow the labor, I said once. I still believe it. Until the open-access movement turns honest about the labor required to accomplish its real goals—notably, we’re fairly honest about gold OA and mealymouthed still about green—and acknowledges the damage it has done to the labor it’s already mustered, it can’t make a better start.

9 Septembris 2008

Feeding Mr. Blue

I saw Mr. Blue the Invisible Heron catch and eat a fish yesterday morning. Seems a good omen! (Although this morning he was hunched up with his neck all pulled in because it is chilly out.) So let’s ask ourselves how our repositories can find and consume more fish—er, peer-reviewed literature.

If peer-reviewed literature isn’t the point of your repository, great! Chances are you’ll have a much easier time meeting your goals for it (although all bets are off if what you’re after is learning objects; faculty are even weirder about those in my experience than they are about their research). Right now, however, I’m not talking to you. Go do your thing, and good luck to you.

I say “voluntary unmediated self-archiving is not a viable model for institutional-repository population” and for some reason people hear “IRs are not viable.” No, folks. No. Look at those adjectives again. They aren’t immutable. However, an IR whose existence is predicated on voluntary unmediated self-archiving of the peer-reviewed literature is indeed not viable. It will fail. (It may succeed at other goals, of course, in which case, see the paragraph above this one.) If you don’t believe me, believe Cat McDowell.

So let’s attack those adjectives. The classical way to attack “voluntary” is to come up with a mandate. If you are a repository manager, forget about it. You can’t do this. If you are a library dean, university librarian, whatever your institution calls it—well, I’m sorry, but you probably can’t do it either. If you find a Stuart Shieber among your faculty, by all means put all your clout and your persuasive ability behind him, because he might be able to pull it off. Without him, you’re stuck; even if you have faculty status and a Ph.D, faculty do not think of you as one of them. (There are exceptions. If you happen to be one of them, go for it. Still, you’d better institute a mandate inside your own library first. If you can’t even do that, what are you doing chasing faculty?)

The one mandate that can be successfully imposed over time is an ETD mandate. Frankly, I think that’s the place to start for most self-respecting IRs at institutions with master’s and doctoral programs. Don’t pull an Iowa, and it’s probably wise not to start it off as a mandate, but otherwise, go to town. Part of the reason to do this is that it brings the repository (and its rat) to faculty attention in a context that won’t make most faculty uncomfortable or suspicious. (A few faculty are suspicious of anything digital still. You can’t do anything with these people except route around them as best you can.) In the long run, that pianissimo approach is likely to pay dividends.

Otherwise, forget about mandates, O University Librarian. Try bribery. Money you got, right? It doesn’t take much, I suspect, to create a bribe program faculty will pay attention to (though, again, I wish Minho had been more forthcoming about that in their article). Bribery probably takes less money than hiring a programmer to bash DSpace into some semblance of usability. So try it!

Attacking “unmediated” takes staff and resources; there is no way around this. That is an unpalatable reality, which is no doubt why libraries have clung so long to the wistful dream of shiny happy faculty self-archiving their shiny happy peer-reviewed articles in the shiny happy self-running repository. However, it may not take as much staff and resources as you think to make respectable inroads, if you have the right staff and if you’re willing to be their faceman and otherwise turn them loose.

Faculty stick their stuff on their own websites. They stick it in disciplinary repositories. They stick it all over the place. Sometimes they publish it in a journal permitting archiving of the final typeset PDF (and again, thank you so much, SHERPA/RoMEO, for this list). It’s out there, enough stuff to make a repository look great. The barriers to going out and getting it for the repository are permission from faculty and automation.

Now, it’s nominally reasonable to turn a repository-rat loose to get faculty permission to canvass their websites for archivable material. The problem is that it doesn’t usually work, partly because it doesn’t scale (one repository-rat, legions of faculty), but largely because faculty don’t know the repository-rat from a hole in the wall and so will not respond well to her request. There are three ways past this:

  1. have the university librarian approach department chairs and deans for blanket permission
  2. have liaison librarians approach the departments they work with (which, again, will require action from management; liaison librarians won’t just spontaneously do this, and they may not even do it as a favor to their repository-rat colleague without a nudge from above), or
  3. most radically—don’t bother with permission, just give your repository-rat leave to go out and do it.

Faculty put this stuff on the public web. If they don’t mind Google and the Internet Archive picking it up, why are they going to mind you? Sure, sure, you want a backstop policy of pulling down something faculty have problems with until problems are resolved, and you may want a notification system as well, but how hard is that? I’ll tell you, it’s much less hard than getting permission! (And yes, I’ve tried getting permission. Even granting that I am far from the most persuasive person in the universe—throwing the most persuasive repository-rat in the universe at this problem still doesn’t scale, and still won’t produce impressive results.)

Combining mediated deposit with a bribe program strikes me as promising. “We’ll toss your department a small grant if you let us turn our repository-rat loose on your material available on the public Internet” sounds like a winner—why would a department say no? They get money (or a money-equivalent, maybe a small bonus to their department’s collection-development funds?) for something they’re investing no effort in. Can’t argue with that!

Automation—this is where you need to pick the right staff, because the IR software industry has not yet responded to the need to mediate article-gathering. At MPOW we have a project going in the right direction, but even lacking that, there is much that someone with modest technical skills can do. Crawling a faculty member’s web CV, screenscraping the HTML for metadata, creating packages for batch import, and then correcting the metadata is moderately tedious work, but I can do it, fairly expeditiously at that. It doesn’t take a genius, just a patient repository-rat, a handy webcrawler (I use SiteOrbiter, with an occasional assist from wget), and the scripting language of your rat’s choice.

Your rat also needs to handle copyright clearance, but that isn’t rocket science, just tedious rote work; the chief requirement is common sense. If you have support staff to throw at this problem, so much the better. Also, it is probably worthwhile to set up an on-the-cheap digitization service: papers only, and only papers destined for the IR. The reasoning here is that I have yet to meet a faculty member—even one with the ink on their diploma still damp—with all their papers in digital form. Many have pre- or postprints in their file cabinets that they’d happily hand over for digitization, but they won’t lift a finger to push a scanner button themselves. Offer to complete their digital record, though, and many will likely bite.

Thinking more broadly may inspire an institution or its library to pull back from the institutional repository as the centerpiece of scholarly-communication or digital-preservation effort. That’s fine; in fact, it’s probably healthy. The data-curation people have a saying: “process, not product.” In our context—the earlier in the research and publication process we can intervene, the more likely we are to be able to get our sticky little fingers on the product we want. IRs are lousy at this; they just weren’t designed for it. What you’ll do about that will vary depending on your environment. Maybe getting in on campus large-scale storage is your answer. Maybe it’s a copyright-consulting service. Maybe it’s offering OJS and/or OCS and/or Omeka alongside or even instead of your IR. Maybe it’s helping faculty with publication tracking. Whatever. Once again, it’s a matter of needs awareness, political will, and largely undemanding technical acumen. (Large-scale storage and its associated networking is a horrible technical problem. OJS, OCS, and Omeka aren’t.)

Mr. Blue spends a lot of his time staring into the water waiting for a fish to come by. It’s a crying shame to let your repository-rat do the same. It’s also unnecessary; there’s plenty of fish they can go after, as long as they’re not shackled hand and foot to the IR and to plentifully-discredited ideas about self-archiving.

5 Septembris 2008

Mr. Blue the Invisible Heron

Mr. Blue was back being his invisible-except-to-me self on one of the dock platforms on the bay this morning. A bicyclist rode by while I was watching Mr. Blue. I am clearly not invisible, judging from the bicyclist’s bemused glance in my direction. Mr. Blue just as clearly is. Mr. Blue the Invisible Heron. The thing is, though, it’s not that he’s invisible; it’s that nobody’s looking.

Peter Suber has made a thoughtful and thorough response to my critique of the “two-thirds” claim. I feel justly stupid about mixing up publishers and journals, and I appreciate the full expansion of the claim, as it clarifies much. (Librarians are fond of pilpul. In this I am wholly a librarian.)

However. I don’t think we should be highlighting this claim (much less the derived claim “Green OA is compatible with TA journals”) regardless of its truth value, and my Achaea U parable was intended to demonstrate why not: publisher permission to self-archive is not filling repositories or otherwise furthering green open access, and that being the case, we have no reason to celebrate it, or to continue pursuing such permissions from other publishers. If we go on doing that, we’ve passed by Mr. Blue once again.

The front side of the sandwich-board hanging from Mr. Blue’s elegantly curved neck reads “The only real win is literature hitting the Web.” That’s what we persistently fail to see, what we walk and cycle and drive recklessly past. We keep chest-beating about supposed victories that don’t wind up creating real wins, and then we don’t think about why not and adjust our strategies accordingly. I’m sorry, I must believe that this undermines our credibility: with the faculty we are trying to reach, with the libraries who have valiantly and steadfastly supported us, and with the publishers on the other side of the chessboard.

Imagine for a moment Elseviley Verlag strategist Hector Palamedes. (I admit to picking that name with malice aforethought. I’m sorry. I’m human. It could be worse; I considered and rejected “Antinous” the foul-tempered wastrel.) “Sure,” Hector says expansively, “let the repository-rats have their preprint and postprint permissions, why not? It’s hard enough to pry a usable manuscript out of faculty when they want it published. They think some poor schmuck librarian can do it? Hah. We haven’t conceded one damned thing—nothing whatever of substance will become open access because of this—but we smell like roses anyway. Win!”

It’s worth asking how Hector Palamedes knew he could safely make that concession. Here’s what I think he knew. He knew that faculty have abysmal management practices for their own data and writing, and he knew that these practices are even worse when computers enter the mix (because practices are reasonably well-solidified in the analog world but still wildly variant in the digital). It’s a short hop from there to the realization that most preprints and postprints are lost forever. Hector also knew that many faculty are creatures of habit, routine, and disciplinary culture. They’ll sign over their rights to a publisher without a second thought because it’s what they’ve always done, but they’ll scrutinize an institutional repository from every conceivable angle—from an initial instinct of distrust, to boot—and most will refuse to bite.

Hector may not have known this at the time, but both he and Ulysses know it now: the “two-thirds” claim creates an unrealistic expectation in faculty (as well as many librarians) about how much faculty work is self-archivable in the form in which most faculty expect to archive it (the typeset PDF, of course). Dr. Troia walked into Ulysses’s office fully expecting that most of her CV would go into the Achaea U repository easily and without fuss. When that didn’t turn out to be the case, she walked out of Ulysses’s office thinking that Ulysses misled her. Sure, sure, it wasn’t entirely Ulysses, it was also the green-OA movement trumpeting the “two-thirds” line, and nobody meant any harm by it—but all she sees is Ulysses and his perfidy. Poor Ulysses, his own movement has hung him out to dry.

We need not feel shame that we didn’t realize all this from the start; nobody catches sight of every Mr. Blue out there. And sure, we could keep claiming the concession of self-archiving permission as a win. We could, if we want Hector Palamedes and his cronies to keep laughing up their sleeves at us because we keep cycling obliviously past Mr. Blue. We could keep claiming that, if we want research libraries to keep futilely wasting resources on IRs that cannot gather the peer-reviewed literature given the cultural and operational practices in which they are embedded. We could keep claiming that, no doubt, if we want scholarly-communications librarians and repository-rats to keep losing credibility with faculty and their fellow librarians by peddling a message most faculty don’t want to hear and a service that’s no practical use to them.

I do not, cannot believe this is what we want. It’s what I believe we’ve got, where our commonplace-book has landed us. I do not, cannot believe we should be happy with it. I could almost wish that SHERPA/RoMEO were to go solid snow-white tomorrow. De facto, outside the usual-suspect disciplines, it might as well be! Making it so de jure would start some usefully pragmatic discussions.

For example… The back side of Mr. Blue’s sandwich board reads “We have misallocated resources.” This may sound like more gloom-and-doom, but I actually believe it’s quite hopeful. After all, resources can be redeployed. The same bully pulpit we’re using to proclaim “two-thirds” can encourage that redeployment, and I strongly believe it should. We’ll have to let go of some of our cherished beliefs and messages first, though, “two-thirds” and “everybody wants OA!” assuredly among them.

What resources have we? Well, we have a troop of trained repository-rats and scholarly-communication librarians, underlying which is a pot of money (because repo-rats, unlike repository software, don’t come free). We have whatever technical and support-staff resources have been placed at those folks’ disposal, which perhaps isn’t as much as we’d like, but it’s definitely greater than zero; in a few places, it’s quite substantial. We have quite a few open-access electronic thesis and dissertation programs, and the infrastructure underlying them. We have pots of money inside libraries supporting Public Library of Science, Hindawi, et cetera. In a few libraries, we have small pots of money to defray author fees. In at least one library, we have money for green-open-access bribes (an idea I still love with an unholy fervor). We have intellectual awareness and agreement on the part of quite a few research faculty that they ought to control more of the intellectual-property rights in their written output. We have a lot of experiments to learn from, including some interesting failures. We have several e-science and open-data movements to which we might be able to hitch our wagon.

So what could we do with all that, keeping Mr. Blue in our mind’s eye? Put another way, how do we deploy the resources we have such that the maximum possible percentage of university research becomes open-access, given the academic culture we live in and the constraints and expectations it imposes?

I have some ideas on that; many of them are implicit in the way I’ve characterized faculty and libraries in this post. I’m sure you have ideas too. And as long as we stay focused on that question, we’ll be fine. I see signs of some smart, grounded, openminded explorations starting to happen domestically (many are already happening abroad), and I’m all for that; that’s where hope lies. If we don’t stay thus focused—here is my stern warning—we will start losing resources (if we haven’t already!) because it makes no sense to throw good money and effort after bad.

Mr. Blue the Invisible Heron is standing right there. Please let’s not pass him by again.

3 Septembris 2008

Two-thirds full?

Peter Suber’s SPARC Open Access Newsletter contains a typically well-presented examination of journal prestige and how it plays out in toll-access/open-access faculty mindshare.

This kind of investigative ethnography is exactly what the movement needs, honestly. We started out with boisterous, bombastic proclamations of faith, and then we looked stupid because those proclamations were just wrong… and if we’re climbing out of that now and settling into some realistic appraisal of academic culture, more power to us.

So in that spirit, I’m going to dissect an often-rehearsed green-OA refrain and how I’ve seen it play out in practice. “About two-thirds of TA journals already give blanket permission for [self-archiving] and many of the others will give permission on request,” Suber says.

This just isn’t true, not unadorned, and I wish we’d stop waving it around. For it to be true, SHERPA/ROMEO (from whose database of publisher policies this datum is derived) would have to cover the entire toll-access journal universe. It doesn’t. It doesn’t even come close. Sure, it covers the behemoth toll-access publishers, but there are two problems with extrapolating from a set of data weighted heavily toward them: first, disciplinary coverage on SHERPA is extra-spotty in areas the behemoths can’t profit from (notably the humanities); and second, I have to date seen zero evidence presented that the behemoths’ policies are typical of non-behemoth publishers. I don’t think they are, myself, though I’m willing to be wrong.

So let’s see how this plays out with Dr. Troia and the repository librarian Ulysses Acqua. Dr. Troia comes to Ulysses with her CV and says “Let’s put all this in the Achaea U repository!” Ulysses is of course thrilled, but because he is a responsible librarian, he explains carefully that someone has to check whether her publishers permit it. Chances are, he says honestly, that most do and a few won’t.

Stepping out of frame for a moment—I’ve had faculty disengage right then and there, when they realize that they probably won’t be able to get their complete publication record in. This is what they typically want—to use the repository as a publication-list proxy—and we often can’t give it to them. Stubborn resistance on the part of certain repository-software packages to creating metadata records that do not include full-text files does not help this problem one little bit.

Let us say that Dr. Troia, though understandably disappointed, gives Ulysses the go-ahead to do the checking for her. She naturally assumes that this is a service Ulysses offers; it doesn’t even occur to her that she ought to do this herself. Ulysses asks diffidently if she will let him see the publication agreements she has signed. “I needed to keep those?” she asks.

This being the answer Ulysses expected (he is not so naïve as he was when he started his job), he smiles, reassures her he can manage without them, and makes a list of the journals she has published in. Let us suppose, somewhat unrealistically, that of those journals present in SHERPA/ROMEO, the distribution of permissive and forbidding publishers exactly mirrors the distribution of SHERPA/ROMEO as a whole: that is, two-thirds allow self-archiving in some form, while one-third forbid it.

Dr. Troia has published in some mildly obscure and boutique basketology journals, especially early in her career. Chasing down as much information as he can about them costs Ulysses about half a day’s work. A couple have shut down entirely; being a pragmatic soul, Ulysses figures they’re not terribly likely to sue him, and puts those articles on the “yes” pile. One more has transferred ownership twice since Dr. Troia published there. Ulysses groans, researches the old owner and the new (neither of whom is in SHERPA), and decides that on balance, this journal should probably be a “no.” On one journal, Ulysses can find no information at all. His policy for such cases (discussed with his administration) is not to archive, though he is aware that some maverick repository managers do otherwise.

Let us not even discuss conference proceedings, because that is a painful topic…

When he is done, Ulysses has about five-eighths of Dr. Troia’s published articles in his “yes” pile. This isn’t a bad or atypical result; depending on discipline, it can be anywhere from less than half to around 85%. He might be able to up the percentage a little bit if he were to negotiate with individual publishers, but he’s tried that before, and the few positive results don’t justify the time expenditure. He suspects the publishers would be more amenable if Dr. Troia were to contact them rather than a mere librarian, but he knows better than to ask it of her.

Now comes the fun part. Ulysses returns to Dr. Troia with a list of the articles that he needs her to provide preprints for, and another list of articles that he needs postprints for. At first Dr. Troia is confused; she can just download the publisher’s version online, won’t that do for him? Ulysses, his heart sinking, shows her a SHERPA page for one of the journals on the preprint-only list. He’s fairly sure he knows what’s coming, and sure enough, she says, “I couldn’t find half these articles in draft if I tried.” Because she’s a nice person and likes Ulysses, she doesn’t add, “and whyever would I bother trying? What a waste of time.”

This leaves Ulysses with just the dead-journal articles and the articles archivable in publisher-PDF form (and I link to this list because it is wonderful and I’m so grateful to SHERPA for compiling it!). Let’s say one-quarter of Dr. Troia’s CV, and that’s fairly generous.

Ulysses isn’t thrilled with that result. More importantly, neither is Dr. Troia. You tell me how likely it is that she’ll be darkening Ulysses’s door after that.

The glass is not two-thirds full, folks. For the faculty I’ve dealt with, it’s often more than half-empty.