Now that I have personas, I can actually talk about repository design.
Well, sort of. I do want to talk out-of-band about a couple of the personas first. Les Carr pointed out to me in email that Ulysses is a bit of a luxury item; a lot of libraries combine him with Menelaus. This is absolutely true, and I was considering building a persona on a satellite campus to reflect that reality. However, the “maverick manager” model exists too (I’m one, and the model applies broadly to consortial repositories), and my suspicion is that the more complex solutions that work for the Ulysses/Menelaus division will also work for institutions where Ulysses and Menelaus are the same person. So for now I’m going to stick with what I’ve got, reserving the right to revisit that decision later.
Something else to notice about Ulysses is that he is also a stand-in for libraries that outsource the repository IT to vendors. They’ll face many of the same functionality and responsiveness problems that poor Ulysses does.
If you’ll recall, Cassandra Athens wants to use Achaea’s repository for two things: to export the problem of potentially-copyright-violating faculty postings to Ulysses, and to form automatically-generated CVs for faculty websites. Let’s talk about the former problem first.
Since Cassandra isn’t stupid, she has probably set up her CMS to have an upload area, disallowing any other way for faculty to put random files up on the Basketology website. (And Dr. Troia and others probably howled about it, but so it goes.) Faculty have two options: they can use Cassandra’s uploader, which Cassandra would vastly prefer they do, or they can put their files someplace else and link to them from the Basketology pages they control, which lands Cassandra right back in the quagmire she’s trying to escape from.
Some social engineering will be required here; Cassandra may well have to go to the chair of Basketology and make a stink about copyright liability. She’ll be much happier about doing that, though, and the chair much happier about dealing with the problem, if Cassandra has a workable alternative to propose.
The design goal, therefore, is to have Cassandra’s CMS talk to Ulysses’s repository such that faculty find it easier and safer to use Cassandra’s upload mechanism than to bypass it—without adding significantly to Cassandra’s ongoing workload. (Ulysses has time to throw at this; except for a few up-front development and testing cycles, Cassandra doesn’t.)
“SWORD!” you may be yelling at me at this point. No. SWORD won’t work, because SWORD more or less assumes you have a nice tidy well-described object to swap around. At most, Cassandra can squeeze a file and a (badly-formatted unpredictable text) citation out of faculty; sometimes they won’t even bother pasting in the citation. Moreover, SWORD is a difficult target for Cassandra to program against, and her CMS doesn’t natively deal with it.
She needs to be able to tell her CMS to email or FTP the file, the CMS’s identifier for the file, and what little information it has about the file somewhere; her CMS needs to receive the item’s handle or other permanent identifying URL in return along with the CMS’s identifier, so that she and/or faculty can link to the item in the repository easily.
This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Cassandra, it doesn’t work at all. (You think I’m draconian about this? Read Alan Cooper.)
Ulysses needs the repository to receive the CMS’s email or watch the FTP folder it has been told to watch, and to notify him that Cassandra’s CMS has fired a file at him. He then rights-checks the submission, applies metadata liberally, arranges for licensing, and (assuming that all checks out) sends the item live, whereupon the repository knows to notify Cassandra’s CMS. Ideally, the repository would even construct and shoot back a pretty, properly-formatted HTML citation! None of this can involve Ulysses interacting directly with the repository server, because Ulysses isn’t allowed to do that.
I repeat: This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Ulysses, it doesn’t work at all. (Seriously. The Inmates are Running the Asylum. He means you, software developers.)
A few DSpace-specific notes. DSpace’s per-item licensing paradigm is all wrong, and I suspect it shares this problem with EPrints, because the root of the difficulty is the OAIS model, in which rights information must be tightly associated with the object to which the rights pertain. This is great for the OAIS model, in which happy little computers talk to other happy little computers, but it’s a disaster for any kind of ongoing interaction between actual people and the repository, as Ulysses’s paper-license insanity illustrates. (That, by the way, was drawn directly from a real-world situation. I refuse to point fingers; you’ll have to take my word for it.)
A Terms of Service agreement is a much human-friendlier solution; Dr. Troia can be told to go click through it, and after she does, neither she nor Ulysses has to be bothered with licensing, and the repository can be smart enough to put the correct rights information in each item. For third-party deposit, Ulysses might have to indicate which author is to be considered the licensing author—but honestly, the repository ought to be smart enough to check the author list against ToS signatories, and let the deposit go through if even one author has signed.
(The repository must also allow for un-signing of the ToS; the obvious use-case is a faculty member leaving the institution. This cannot be allowed to affect previously-deposited items, but it should halt deposit of future items that depend on that faculty member’s consent.)
DSpace’s other major problem is its unwieldy workflow system, which doesn’t really cover the use-case just expressed. DSpace doesn’t even kick an item into the workflow system until it’s been uploaded and all the metadata is complete, which is exactly bass-ackwards from what Cassandra and Ulysses need it to do. Ulysses needs a staging area where Cassandra’s CMS can dump stuff until he can get to it. DSpace doesn’t have that.
There are other minor nits. The reject notice for an item that doesn’t pass a rights check needs to go to the corresponding author(s), not to Ulysses. DSpace needs an external-facing notification mechanism more sophisticated than email. Ulysses needs to be able to shove Basketology off on Menelaus, once everything is running right and Menelaus has learned how to check rights—Ulysses can’t possibly handle the entire campus doing as Cassandra has done, because there aren’t enough hours in the day. All of this is solvable, some of it trivially.
So. Let’s get to work? I hear there’s a repository-programming event coming up…