16 Maii 2006

That’s the stuff

So a couple months after the kerfuffle about how to explain citation advantages for open-access articles, a new study comes out saying “no, no, it really is the open-access advantage.”

This study went flat-out to deal with other explanatory possibilities. They stuck with articles in one journal (PNAS) to cancel out journal-prestige issues. They stuck with newly-published articles, to cancel out author-vanity effects. They multiply-regressed their data until the spreadsheets cried for mercy to account for career length and similar author-prestige measures (though I would like to see more tests on OA and non-OA articles by the same authors, just for fun).

And guess what. Even taking all that into account, there’s still a significant and measurable advantage for open access. Ha bloody ha with knobs on, as Bertie Wooster would say.

There’s one joker in the abstract for repository-rats, though (added emphasis mine): “Articles published as an immediate OA article on the journal site have higher impact than self-archived or otherwise openly accessible OA articles.”

I believe that, actually, especially for newly-published articles. It’s just plain easier to find an article via a publisher’s website than on the open Web. After all, how does your typical researcher hear about a new article, and how does she find it once she’s heard about it? Either she watches the publisher’s website in the course of her daily work, or she sees a citation and hunts it down via her library, which in the case of PNAS will send her right to the publisher’s website.

I also believe, though, that this advantage is likely to thin out over time; articles that have been out longer get found any number of ways, Google not least. And obviously the publisher’s website is less salient for journals that are widely available in article databases (which PNAS isn’t, I don’t think) or for disciplines that have one or two well-known and well-used disciplinary repositories.

Even so, we repository-rats have got to get busy on better open-access linking and discovery mechanisms. We’re cursed blessed with highly heterogeneous repository content, which makes us dubious search destinations; the chances that a given repository has an item of interest to a specific person is really pretty small. Metadata sharing, harvesting and metasearch perforce have to be the answer, but the current state of the art in harvesters marketh not disciplinary boundaries—which makes the search engines frustrating to use for real researchers trying to find work within their disciplines.

And we have to mark peer-reviewed, published-elsewhere articles better. The way we do it now (we mostly don’t) is just plain bloody broken.

I can dimly envision some ways to make discovery better, but most of my thoughts involve data-mining techniques that I am entirely unsuited to discuss, never mind implement—so as usual, the repository-rat sits quietly with folded hands waiting for smarter people to climb on the bandwagon.