(My live notes from the John Wilbanks keynote.)
John Wilbanks
i.a.n.a.r.e. “I am not a repository expert.” “I almost said that I am not a repository rat.”
gets his information from Mellon and JISC reports, and links seen on blogs “especially Caveat Lector.”
“why is there a disconnect between planning to share and actual sharing?”
Disruptive processes can’t be planned in advance; planned innovation is slow.
Digital publishing is just “a bigger earhorn” because we’re still thinking that the way to communicate is through writing papers. We’ve made that better, faster, and cheaper, but the process is basically the same; boil it down to 8.5×11 pages.
Process change comes more slowly than product change.
So why is it hard to get this content? Why don’t faculty see the light?
- stable systems are resistant to change on multiple levels. No one thing will make people wake up; there are interlocking barriers to change. One such barrier is copyright, which locks up the container of the facts, not the facts… but it really locks up that container!
So we’ve moved to leasing materials, not owning them; and licensing makes it harder to unlock those facts from their containers. No indexing allowed; no adding hyperlinks.
Rights clearance is a pain! It is a block preventing process disruption. We haven’t provided enough incentive, enough “universal solvents,” to remove these blocks. Now we’re even seeing copyright applied to databases, often through confusion rather than malice (e.g. ChemSpider). What do the ideas behind CC mean, as they propagate into the scholarly realm? The rights problems are going to get MORE, not LESS complex; forcing IRs to focus exclusively on the peer-reviewed research ignores the library’s role as repository for lots of stuff. This complexifies rights issues.
- Faculty prefer carrots to sticks. So what do incentives need to do for them?
The minimum incentive needs to get faculty to go through the metadata/upload process, or to let somebody else do it.
Cartoon: “Behind one door is tenure; behind the other is flipping burgers.” If it doesn’t help them get tenure or another grant, it competes with activities that WILL help them toward tenure and grants. This is another way the system resists change! We wouldn’t NEED mandates if this weren’t so.
Let’s assume we fix the copyright and rights-clearance and incentives problems. This will create a flood of work!
Not easy to install IR software! Very powerful systems, but they need PEOPLE to run them. Can’t just go to register.com. Too frequently these people are not part of the conversation. Can be hard! (CavLec: miniature disasters post. An hour to change one link!) This, too, is another change-resister.
This is the complexity of the system we’re up against. Multiple levels of barriers, with multiple fail-safes.
reports from the front lines: building a commons is really, really hard! Takes passionate people with a clear point of view who are not willing to compromise on that view.
Everything at Science Commons is based on “running code:” legal and software code. >1000 journals now under CC license, which is pretty good! Scholar’s addendum engine, done in a single line of HTML code that can be dropped into a page. But they can’t keep data on who uses it (privacy), so it’s hard to follow up to assess.
Goal: using this to negotiate with publishers. But unless there’s a funder or institution behind this (NIH or Harvard), faculty won’t use it, because the power is on the side of the publisher (remember tenure!). “When institutions copy Harvard, we hope this will help.” (But people aren’t copying Harvard! -me)
Databases: copyright status isn’t clear, how/when to integrate isn’t clear, storage is a technical challenge. Ties lawyers in KNOTS! Default stances different in US and UK (no protection in US, some in UK). This makes Science Commons’s charge harder — but the only way to combine datasets is to eliminate the rights barriers. Solutions: CC0, and a set of “Science Commons norms” (citation, plagiarism control, etc). Dangerous to use law for that; better to use norms (yes! -me).
However, this conflicts with the protection instinct faculty have, and corporate funders even more. However, the protection instinct is frequently (? I would say “sometimes”) an instinct to protect freedom.
OA solves the legal problem, but the other problem is the “container problem” — the paper as a container for facts, the standalone database as a container for facts, are bad ways to go. Solution: Semantic Web. How do we make Google work better for science? Google finds things based on inbound links, but Google doesn’t search databases and doesn’t notice “links” to them. Goal: e pluribus unum (discovery tools that work usefully across different datastores). Get money to bribe database owners to do the right thing to make this possible.
Can put queries in URLs and then remix them by changing URLs. “Corpus of queries as links” and let people hack them. Not planning to share, but actually sharing, and throwing the result open, and it creates a commons! <$500K to make this possible! Have to do horrible screenscrapes and stuff just to create proof-of-concept to show people what’s possible when you open up.
NOTHING replaces hacking and releasing! Using trademark to protect the quality of their work, not copyright.
“Don’t plan to hack, hack! That’s the only way around the incentive problems.”
2 futures for repositories.
Note that repositories are points on a map — with no edges! No links between them! No networks! But networks create better incentives than points. Doesn’t mean “get complicated,” just the opposite: simple systems win the network game! (”OAI-PMH helps, but not enough,” with which I completely agree.) AOL and Prodigy were points, and they did cool stuff, but only “their people” could do anything to improve the system. The WWW was bloody ugly in comparison, but it was OPEN, and so huge numbers of people improved it. Three layers of openness (Benkler): physical layer, code layer, content layer. Fourth layer: knowledge layer, which means we have to deal with IP problems — so we have to engage the copyright problems.
If we do this right, we create gears instead of locks. This is the opportunity! “Open copyright, balanced incentives, and distributed workloads.”
We have to do this by solving an information problem faculty actually have. That’s the road in. (HALLELUJAH. -me) What questions can only a network of IRs solve? So that people who use IRs outcompete people who don’t.
“How does the IR keep me from flipping burgers at McD’s?”
Individual brain capacity is not scaling, but COLLECTIVE brain capacity is, so how do we make our stuff work on a collective level?
Conclusion: don’t wait. Lots of things need to happen before all this becomes real! If we wait until all the problems are solved, the commons won’t have what it needs to explode. But people aren’t watching IR space, which is the best time to create an open, disruptive system! Use existing ontologies. Work around problems rather than tackling them head-on.
Create new ways to measure things. Tenure vs. McD’s is a matter of citations; that’s the only thing we know how to measure! But what about data? Downloads? TrackBack?
We need a thousand flowers blooming, not the slow process of consensus.
Invest in your repository staff! Hard to do when facing real financial crises like the serials crisis, but “there’s nothing so expensive as cheap people.” (I am ready to cry. That’s exactly what libraries are NOT DOING.)