‘DSpace’ Archive

31 Octobris 2008

Miniature disasters and minor catastrophes

KT Tunstall’s wonderful song is playing on Pandora as I type this, and it’s just so fitting I have to use it as this post title!

This is a tale of beating DSpace and OS X with many, many rocks until they sorta-kinda work. I present it here in hopes of sparing someone else considerable annoyance.

One of my best clients emailed me with a “please fix this link in my HTML item” request. Simple enough, right?

The said HTML item is nested in folders three deep. This means that DSpace’s regular exporter breaks, because it’s not smart enough to create intermediate folders. Joy.

So I kicked that up to the dspace-tech list, and got a kind response from Larry Stone of MIT: “use the METS packager export instead.” I did, and lo! it worked.

So I twiddled the file needing twiddling, zipped up the whole, and tried to put it back. First the METS ingester barfed because I’d zipped the folder containing all the files, not the files themselves. Okay, durrr, I felt stupid and zipped the files properly.

Then the METS ingester barfed because unbeknownst to me, Mac OS X’s native zip utility adds OS X-specific junk into the zip file. Quite properly, the ingester said primly, “Your METS manifest doesn’t match your actual files. Go forth and fix it.” The solution to this little difficulty turned out to be YemuZip, which can emit a normal zip file.

Then the METS ingester barfed because the file I’d twiddled was a different size from what the METS file was claiming, logically enough. Helpfully, the ingester’s error message told me what size the file actually was, so I could pop into the METS file and fix the size in the several places it appears.

Then the METS ingester barfed because the checksums in the METS file didn’t match the checksum of the file I’d twiddled. There’s probably a quick and easy way to calculate a checksum from the command line, but CheckSumApp has a cute little GUI. Like the file size, the checksum appears several places in the METS file, so I made sure I got all of them.

Then the METS ingester actually worked. So now I have to go in and do database magic so that the item handle points to the new item, because the METS ingester doesn’t have a replace option the way the normal ingester does.

Anybody who thinks that a normal repository manager is going to go through all this to fix a link in an HTML file is as barking mad as I am. This is the ridiculousness that DSpace’s insistence on no-versioning, butterfly-pinned-to-wall “final archival” reduces me to. Yes, it’s funny—but it also cost me an entire hour to fix one link.

30 Octobris 2008

The dangers of reused code

So because the title of an item on display is shown at the top of item-display pages (imagine!) in my Manakin themes, I went and took the title out of the metadata listing to avoid redundancy and clutter.

One small problem with that. The same code gets called in the case when the item is being edited or checked over, but in that case the page title is “Item submission” or something similarly inane, so the item title doesn’t appear anywhere on the page. Doubleplusungood.

My current fix is to put the title back in if the string “workflow” appears in the URL anywhere. That’s… kinda hacky, I admit, and I’m not entirely sure it handles all edge cases, but it’s at least not as broken as before.

27 Octobris 2008

Fugly Manakin hack: User-friendly file descriptions

DSpace’s JSPUI had one pleasantly usable feature: instead of displaying the MIME type of a file, it displayed a short admin-editable file-type description. “PDF” is a vast improvement over “application/pdf.” For one thing, it’s shorter.

Unfortunately, Manakin only knows from MIME types, since METS isn’t very friendly to niceties like user-friendly file descriptors. Fixing that was on my to-do list. I was told by the DSpace developers that the right and proper way to fix that was to insert PREMIS metadata into the METS. To do this, I would have to figure out PREMIS and then write an Aspect (I think) to twiddle the METS.

People, I am too damn lazy for that. So you get this fugly hack instead. I don’t feel too bad about it; storing descriptions in the database is kind of a fugly hack too.

First, figure out what’s actually in your DSpace instance by way of MIME types by running this query on your database: select mimetype, short_description from bitstreamformatregistry order by mimetype;. (You will probably immediately notice a potential problem with this hack: text/plain has two values, depending on whether it’s a content or license bitstream. I think this is not actually a problem, because this hack should only get called for content bitstreams.)

Then create a template in your theme as below, making one <xsl:when> for each MIME type you want a user-friendly description for.

<xsl:template name="getFileTypeDesc">
    <xsl:param name="mimetype"/>
    <xsl:choose>
        <xsl:when test="$mimetype='application/pdf'">
            <xsl:text>PDF</xsl:text>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$mimetype"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

The <xsl:otherwise> returns the MIME type as a last-ditch descriptor.

Next, go to the code that builds each bitstream row; I’d show you mine, but I have hacked the living daylights out of it because I cannot stand unnecessary HTML tables. Look for the MIME-type code (hint: it’s the contents of <with-param> below) and replace it with the following:

<xsl:call-template name="getFileTypeDesc">
    <xsl:with-param name="mimetype">
        <xsl:value-of select="substring-before($file/@MIMETYPE,'/')"/>
        <xsl:text>/</xsl:text>
        <xsl:value-of select="substring-after($file/@MIMETYPE,'/')"/>
    </xsl:with-param>
</xsl:call-template>

Whenever you add a new content-type to the repository, you’ll have to add a new <xsl:when>—but realistically, how often does that happen? I’ve done it once a year, maybe? Less?

It’s a fugly hack, but it works.

16 Octobris 2008

Yay usability!

So at work we’ve gone the DSpace-1.5-plus-Manakin route, rolling out two new themes along with the new release. If I’ve seemed crustier and more irritable than usual lately? Getting a release ready to go, and dealing with the inevitable post-release bug list, does that to me.

So, yes, I’ve been swearing at Manakin a lot lately. Not as much as I’ve been swearing at IE6, admittedly, but there have been some “wait, what? how could you do that? what were you thinking?” moments. (And just so the Manakin devs know: if y’all change the pattern of the browse URLs one more time, I’m agonna take out a contract on ya. I let it go into production with broken links!)

This morning, though, I had to add an eperson to a collection as submitter, collection admin, and review step performer. It took me many fewer clicks and scans to do that than in the past; the ability to search for an eperson instead of browsing an obtuse list is great! Thank you, thank you, thank you for cleaning up this little chore.

17 Septembris 2008

And it was good

Earlier this week, the godly sysadmin got the last of his major hacks into 1.5, and got our test installation up and running thereupon.

Yesterday I got down to brass tacks installing my themes, which promptly broke because the Manakin devs fixed their misspelling of “standardAttributes.” (I’m not pointing and laughing. Really I’m not. These things happen.) That was a simple enough fix, as were a couple of messages.xml fixes.

And today I hacked at the bad stuff. My scoped search box was amazingly unbelievably broken, but I got it fixed after a lot of unnecessary metaprogramming and a similar amount of very necessary cussing. (If the fit comes upon you to program Javascript inside XSLT? For your own sanity, I urge you to resist it.)

The other thing that broke badly was my big logo hack. The problem was that Manakin doesn’t put METS metadata inside the DRI any more; it’s all called by reference. Since the logo URL lives in METS, I had to figure out how to make XSLT call the right METS file and return the URL from it. Once I had that sorted, the $context-path variable confused me rather, but Tim Donohue kindly got me straightened out and flying right, and so the logos are now fixed as well.

At this point, I have some minor XSLT and CSS tweaks to do before I’m willing to set 1.5 free, but I think I can tear through them in a day or two (although considering the number of meetings I’ve got for the next three workdays, it may take longer than that). If I get through those, I can start wading through the wishlist. Drop-dead rollout date is Open Access Day, and I’m fairly confident we’ll make that.

And it was a good day.

6 Maii 2008

Courtesy

A courteous interface is a marvelous thing. It gets out of the way. It intuits what you want, squeezing every tiny bit of information possible out of whatever tidbits you feed it. It doesn’t bother you with its nasty little internal troubles. It’s Jeeves, there with a pick-me-up when you’ve got a drink-fueled headache.

DSpace’s administrative and item-submission interfaces are more like the temporary Jeeves replacement Bertie got stuck with once, the guy who snarled all the time and snaffled socks. It is about as courteous as a New York cabdriver in heavy traffic. As a result, it wastes incredible amounts of human time—my time, my sysadmin’s time, my submitters’ time, the time of dozens of admins just like me. I promised to talk about that, so I will.

For example. Just this morning I got an unhappy email from a submitter who didn’t have access to all the collections in a given community. The said collections are two or three levels deep because of intervening subcommunities—and while I’m talking about wasted time, I’ll spend a few words on wasted cognitive capacity, because I have yet to meet anyone for whom the DSpace distinction between communities and collections is intuitive or useful. My submitters expect to be able to submit items to communities. They do not understand why some items on the sitemap (which is how they think of the communities-and-collections page) are bold and others aren’t. I hate wasting time and effort explaining this stupid and essentially otiose distinction.

Right. Back to my submitter and her problem. I had to click open every single collection in order to click again to check its submitter list. For those collections she didn’t have submit access to, adding it was a four-click process and could have been more: click to open the eperson list, click to go to the last page, click to select her address (she’s late in the alphabet), click to update the submitter group. Wasted. Time.

And don’t get me started on DSpace’s repo-rat–hostile habit of building impenetrable names for otherwise-unnamed submitter groups. COLLECTION_27_SUBMIT. Yeah, that makes all kinds of sense in my little rat brain, how about yours? (If you’re wondering, the number is the collection’s database identifier, which is almost impossible to figure out from the DSpace UI. Real friendly, DSpace.) And these names proliferate like rats, because there’s no way to tell DSpace “use the people I just told you about, plzkthx” without going through the added hassle of creating and naming an actual group, and no way to tell DSpace “use the standard access rules for this community” or “use the access rules for this other collection.”

So then I needed to set up a new collection for her. Could DSpace pick up on the submitter-selection work I’d already wasted a bunch of time doing? Could it hell. I had to go through the same clickety-clickety process all over again. There’s no access templating in DSpace; every single collection in every single community is sui generis. Just imagine how much time I get to waste when someone leaves the university and someone else takes over their DSpace deposit duties! Woo-hoo! Because obviously I don’t have anything important to do with my time.

Which brings us to the DSpace deposit interface. To be clear, I’m working from 1.4.2 here, not 1.5—but let’s be clear about something else too, namely that 1.5 doesn’t fix all of these warts, though the Configurable Submission system is indeed a step forward. So let’s waste some time, everybody!

You start your submission from a collection page, or you start from My DSpace, in which case it asks you to pick a collection. What does it do with this collection information? It determines whether you have deposit access, duh, and if your friendly neighborhood repository-rat has spent time customizing a metadata form for that collection, it uses that form. (Does DSpace ask on collection creation which metadata forms to use? It does not. That’s configured via a file called input-forms.xml on the server. Mm-hm, that’s right, I have nothing better to do with my time than seek out and edit—twice, because I keep a version in source control—bitsy little XML files DSpace leaves all over creation.) Anything else? Like surveying existing items in that collection for commonalities in order to prepopulate metadata fields? Nah. Machine learning would save a human being’s time or something. Can’t have that.

Next you run into this screen, which I loathe with a white-hot loathing neutron stars might envy:

First DSpace submission screen

The top question is just goofy. In my experience, this is true for less than one-tenth of one percent of submissions. The Québécois might have a use for that checkbox, but how many DSpace installations does Québéc have exactly, and why exactly wouldn’t a Québécois installation just put in dc.title.alternative by default? So why is every submitter into every DSpace installation forced to cope with that moronic checkbox for every single submission? Because DSpace doesn’t give a tinker’s damn about anybody’s time or cognitive load, that’s why. The default is correct, at least, but that’s decidedly small comfort.

(I suspect there’s a librarian at the bottom of this interface wart somewhere. What about MARC 246, someone must have screamed. Guess what? I don’t care about MARC 246. I care about efficient use of person-hours, which that checkbox unquestionably isn’t. I love my fellow librarians, except when I hate them. I hate them when they gleefully glomp every iota of patron time and effort they can get their little mitts on.)

The middle question is difficult to understand (for my submitters, anyway; more of them get it wrong than right), and DSpace doesn’t explain why you have to answer it. I get a lot of questions from submitters about putting in publication dates and citations, because my submitters don’t mentally connect those fields with that checkbox. But that’s what that checkbox does when checked: it adds fields to the next metadata screen for dc.date.issued, dc.publisher, and dc.identifier.citation. (How many repository-rats running DSpace just learned something? Don’t be embarrassed. It was months before I figured it out, too, and I had to go in and read code before I had it sussed.)

But it gets better (for “worse” values of “better”). Imagine Ulysses Acqua for a moment, trying to be nice to Dr. Troia and the little open-access basketology journal she wants to archive. He uses the input-forms.xml file to make a custom metadata form that puts basic citation information for the basketology journal in dc.identifier.citation so Dr. Troia doesn’t have to retype it every time. When Dr. Troia submits her first article, she doesn’t think to tick the middle checkbox, and DSpace doesn’t tick it for her. What happens?

SHE GETS AN ERROR MESSAGE. I kid you not. AN ERROR MESSAGE. It reads “You’ve indicated that your submission has not been published or publicly distributed before, but you’ve already entered an issue date, publisher and/or citation. If you proceed, this information will be removed, and DSpace will assign an issue date.”

I—I—I honestly have no words. Do I need them? Maybe I do. The Jeeves interface never, ever, EVER threatens to discard information Bertie has provided it. It’s hard enough to pry useful information out of Bertie as it is! And talk about your bizarrely opaque, unhelpful, and inappropriately finger-wagging error messages! (How does Dr. Troia fix the problem, if she wants to keep her citation information or date or whatever? The message doesn’t even say.) I am just agog that this grotesque interaction exists in a production software system.

(Yes, of course I’ve triggered it. How do you think I figured out it exists? I don’t go looking for smelly garbage like this, I assure you.)

But it even gets worse than that. Weird interactions between input-forms.xml and the deposit code can make checkboxes on this page disappear when they shouldn’t. I haven’t dug into how this happens—but it bit me hard, such that I had to be unhelpful and take a date.issued out of a thesis metadata form in input-forms.xml. Because hey, troubleshooting DSpace’s sclerotic deposit system is such a productive use of my time!

Returning to our initial screen once more: there is absolutely no need whatever to ask the submitter about multiple files. None. Simply assume that submissions may have more than one file! Asking submitters to think about it up-front instead of at upload is wasted time.

So there we have it. An entire wasted screen, multiplied by untold numbers of DSpace submissions. There’s plenty more in there, the licensing system not least; Jeeves interface, not so much.

EPrints, as a rule, is a much better gentleperson’s personal gentleperson than DSpace. EPrints, for example, asks for item type up front, and configures its deposit screens to match, without the intervention of either submitter or repository-rat. Who knows, this politeness may have something to do with developer attitude. The last time I waxed profane on matters repository-interface-ish, Les Carr was in my inbox less than a day later asking eagerly, “is this what you mean? would this solution I just came up with work for you?” Whereas DSpace gets on my case for being negative. I’m just sayin’ here.

No. No, I’m not just sayin’. It runs deeper than that. I’ve occasionally seen a few nods in the DSpace developer community toward EPrints interface accomplishments. Unfortunately, the feel of the discourse I’ve seen is “look at all the shiny AJAX! we want that!”

This is not about shiny AJAX, people. It’s not about shiny at all. This is about DSpace not wasting my time. There’s a ton of work DSpace could do with the aim of removing time-wasters before anyone writes a single line of Javascript or de-uglifies a single line of CSS. To do so, though, DSpace developers will have to learn to give a damn about my time and the amount of it DSpace has wasted and continues to waste. I see next to zero evidence of that learning taking place. (Tim gets it, which is why I say “next to zero” rather than just plain zero.)

Stop. Wasting. My. Time. That’s far and away the most important interface-development priority DSpace should adopt. For values of “me” that include “all repository-rats and willing depositors,” of course. DSpace’s interface needs to sit down at its mama’s knee and learn some courtesy.

13 Martii 2008

Search scope in consortial repositories

If you run a consortial repository, one of the things Manakin brings you is the possibility of separating each institution in your repository from the others visually, such that each institution practically seems to have its own site!

Manakin is actually pretty careful about the URL design of its scoped browsing. If you start browsing inside a particular community or collection, you’ll still see that community or collection’s design (as opposed to the default), because the URL hangs onto the handle, which is what the theme chooser uses to decide which theme to display. Very smart!

Scoped searching, however, is a different and rather nastier problem. Out of the box, Manakin’s search box is designed to allow the user to choose between two types of search: the entire repository, and the currently-browsed community or collection. This is a problem for consortial repositories who want their institution-level communities to seem wholly independent of each other. There shouldn’t be any all-of-DSpace search available from a community’s page. The “all of DSpace” scope should be replaced by an “all of the institution’s community” scope.

(I initially thought there shouldn’t be any broad-scope search at all. This was completely wrongheaded of me. If you’re in a departmental collection, you should be able to search the entire institution’s collections. I mention this so that you won’t make the same mistake.)

At present, I have solved about half this problem. The half I can’t solve is the search-results page, which uses the site-default theme no matter what I do, and cannot be made to respect the scoping established on the search page. I am annoyed by this, but I’m pretty sure that solving it is beyond my abilities. (What it would take, I suspect, is sticking information about the referring page and its theme into the DRI. Somebody want to write an Aspect to do that?)

However, I have solved the search-box scoping problem. It’s a start. Here’s how you can too.

First, you need to know when you’re on the main community page. For this, you need to record that page’s handle in the theme’s XSLT. This got slightly hairy for me because my test and production servers have different handle prefixes. If yours don’t, your solution is easier than mine. Anyway, here’s mine (and yes, I’m giving away the farm here a bit, revealing which community I’m doing this for, but I don’t see that that’s a problem):

<xsl:variable name="handle-prefix" select="substring-after(/dri:document/dri:meta/dri:repositoryMeta/dri:repository/@repositoryIdentifier, 'hdl:')"/>
<xsl:variable name="uwmad-handle">
    <xsl:choose>
        <xsl:when test="$handle-prefix='1960'">1960/10498</xsl:when>
        <xsl:when test="$handle-prefix='1793'">1793/8334</xsl:when>
    </xsl:choose>
</xsl:variable>

Now you need to mess with the radio buttons in the search form. On your community’s main page, you’ll replace them with a hidden input containing that community’s handle as scope. Everywhere else, you’ll sneakily change the “everything” scope to search just your community. Take a deep breath — a lot of code here:

<xsl:choose>
    <!-- when we're on the UW-Madison home page, don't offer a choice of scope -DS -->
    <xsl:when
        test="/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container']=concat(’hdl:’,$uwmad-handle)”>
        <input id=”ds-search-form-scope-container” name=”scope” type=”hidden”>
            <xsl:attribute name=”value”>
                <xsl:value-of select=”$uwmad-handle”/>
            </xsl:attribute>
        </input>
    </xsl:when>
    <xsl:otherwise>
        <label>
            <!– edited so that a scope of “all” ONLY searches UW-Madison stuff –>
            <input id=”ds-search-form-scope-all” type=”radio” name=”scope”
                checked=”checked”>
                <xsl:attribute name=”value”>
                    <xsl:value-of select=”$uwmad-handle”/>
                </xsl:attribute>
            </input>
            <i18n:text>All of MINDS@UW-Madison</i18n:text>
        </label>
        <br/>
        <label>
            <input id=”ds-search-form-scope-container” type=”radio” name=”scope”>
                <xsl:attribute name=”value”>
                    <xsl:value-of
                        select=”substring-after(/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container'],’:')”
                    />
                </xsl:attribute>
            </input>
            <xsl:choose>
                <xsl:when
                    test=”/dri:document/dri:meta/dri:objectMeta/dri:object[@objectIdentifier=/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container']]/mets:METS/mets:structMap[@TYPE='LOGICAL']/mets:div[@TYPE='DSpace Collection']”
                    >This Collection</xsl:when>
                <xsl:when
                    test=”/dri:document/dri:meta/dri:objectMeta/dri:object[@objectIdentifier=/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container']]/mets:METS/mets:structMap[@TYPE='LOGICAL']/mets:div[@TYPE='DSpace Community']”
                    >This Community</xsl:when>
            </xsl:choose>
        </label>
    </xsl:otherwise>
</xsl:choose>

As best I can tell, this works quietly and without fuss. Best kind of hack!

3 Martii 2008

Useful things

Michael W. Carroll’s whitepaper on the NIH policy is a useful document for all repository-rats, not just those who have NIH grantees to worry about. Lovely tidbits in there on various aspects of copyright vis-a-vis scholarly communication. Recommended.

DSpace finally, finally, finally has an up-to-date list of vendors. I can’t speak to how good any of them are (though I’d trust some of the named individuals implicitly based on what I’ve seen of them on the DSpace lists), but just having the list is a vast improvement over the previous situation. Good job, DSpace Foundation!

I’ve mentioned before the excellence of scholarly-publishing executive Mike Rossner, and he’s gone and done it again. Right or wrong, it takes a special sort of courage to break ranks and call out your own kind in an extremely fraught conflict. Rossner’s letter is useful as an anti-FUD device.

For those of us hoping for motion on an initiative similar to Harvard’s, this month’s SPARC Open Access Newsletter is a must-read. Amid the straightforward history and lucid analysis are tantalizing tidbits about how it was done. “Enlist Peter Suber” sounds like good strategy to this rat!

5 Februarii 2008

Argh, Manakin, don’t do that!

Manakin has one seriously broken behavior that I can’t figure out how to fix, but there’s a workaround that I recommend to everyone.

If you try to click on a link to a page that needs you to be logged in, Manakin duly asks you to log in and then shoots you over to your page. Fine. If, however, you click a login link from an open page, once you’re logged in, Manakin returns you to the root (main) page of your repository.

Let me count the ways in which this is broken:

  1. It doesn’t clearly inform you that you’ve logged in properly. When I first ran into this, I wondered if I had!
  2. If you came from a page with a different theme (visual design) from the main repository page, welcome to total confusion!
  3. It doesn’t take you to a page that does anything useful with your logged-in status. If you’re coming from an open page, you probably logged in to deposit an item or handle something in your workflow. To do this from the main page, you’re stuck clicking at least once, and probably twice!
  4. It’s not what the JSP UI does. The JSP UI sensibly sends you to your profile (”My DSpace”) page. Switching to Manakin and not fixing this behavior will confuse every single existing user of your repository. (How did TAMU not find this out on user testing? Didn’t they test? Or do they not have any existing JSP UI users?)

My recommendation is never to link to Manakin’s login page in a theme. Instead, link to Manakin’s profile page (”/profile” instead of “/login”). You can still label it “Log in” if you like. This way, Manakin will say “oops! can’t go to your profile page if you’re not logged in!” It will then log you in and send you to your profile page. Which is correct, not-confusing behavior.

Edited to add: To make this happen by default, go to Navigation.java in the EPerson aspect of Manakin. Find “/login” and change it to “/profile”. Install and rebuild. You’re done.

30 Ianuarii 2008

Solving Cassandra’s problems

Now that I have personas, I can actually talk about repository design.

Well, sort of. I do want to talk out-of-band about a couple of the personas first. Les Carr pointed out to me in email that Ulysses is a bit of a luxury item; a lot of libraries combine him with Menelaus. This is absolutely true, and I was considering building a persona on a satellite campus to reflect that reality. However, the “maverick manager” model exists too (I’m one, and the model applies broadly to consortial repositories), and my suspicion is that the more complex solutions that work for the Ulysses/Menelaus division will also work for institutions where Ulysses and Menelaus are the same person. So for now I’m going to stick with what I’ve got, reserving the right to revisit that decision later.

Something else to notice about Ulysses is that he is also a stand-in for libraries that outsource the repository IT to vendors. They’ll face many of the same functionality and responsiveness problems that poor Ulysses does.

If you’ll recall, Cassandra Athens wants to use Achaea’s repository for two things: to export the problem of potentially-copyright-violating faculty postings to Ulysses, and to form automatically-generated CVs for faculty websites. Let’s talk about the former problem first.

Since Cassandra isn’t stupid, she has probably set up her CMS to have an upload area, disallowing any other way for faculty to put random files up on the Basketology website. (And Dr. Troia and others probably howled about it, but so it goes.) Faculty have two options: they can use Cassandra’s uploader, which Cassandra would vastly prefer they do, or they can put their files someplace else and link to them from the Basketology pages they control, which lands Cassandra right back in the quagmire she’s trying to escape from.

Some social engineering will be required here; Cassandra may well have to go to the chair of Basketology and make a stink about copyright liability. She’ll be much happier about doing that, though, and the chair much happier about dealing with the problem, if Cassandra has a workable alternative to propose.

The design goal, therefore, is to have Cassandra’s CMS talk to Ulysses’s repository such that faculty find it easier and safer to use Cassandra’s upload mechanism than to bypass it—without adding significantly to Cassandra’s ongoing workload. (Ulysses has time to throw at this; except for a few up-front development and testing cycles, Cassandra doesn’t.)

SWORD!” you may be yelling at me at this point. No. SWORD won’t work, because SWORD more or less assumes you have a nice tidy well-described object to swap around. At most, Cassandra can squeeze a file and a (badly-formatted unpredictable text) citation out of faculty; sometimes they won’t even bother pasting in the citation. Moreover, SWORD is a difficult target for Cassandra to program against, and her CMS doesn’t natively deal with it.

She needs to be able to tell her CMS to email or FTP the file, the CMS’s identifier for the file, and what little information it has about the file somewhere; her CMS needs to receive the item’s handle or other permanent identifying URL in return along with the CMS’s identifier, so that she and/or faculty can link to the item in the repository easily.

This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Cassandra, it doesn’t work at all. (You think I’m draconian about this? Read Alan Cooper.)

Ulysses needs the repository to receive the CMS’s email or watch the FTP folder it has been told to watch, and to notify him that Cassandra’s CMS has fired a file at him. He then rights-checks the submission, applies metadata liberally, arranges for licensing, and (assuming that all checks out) sends the item live, whereupon the repository knows to notify Cassandra’s CMS. Ideally, the repository would even construct and shoot back a pretty, properly-formatted HTML citation! None of this can involve Ulysses interacting directly with the repository server, because Ulysses isn’t allowed to do that.

I repeat: This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Ulysses, it doesn’t work at all. (Seriously. The Inmates are Running the Asylum. He means you, software developers.)

A few DSpace-specific notes. DSpace’s per-item licensing paradigm is all wrong, and I suspect it shares this problem with EPrints, because the root of the difficulty is the OAIS model, in which rights information must be tightly associated with the object to which the rights pertain. This is great for the OAIS model, in which happy little computers talk to other happy little computers, but it’s a disaster for any kind of ongoing interaction between actual people and the repository, as Ulysses’s paper-license insanity illustrates. (That, by the way, was drawn directly from a real-world situation. I refuse to point fingers; you’ll have to take my word for it.)

A Terms of Service agreement is a much human-friendlier solution; Dr. Troia can be told to go click through it, and after she does, neither she nor Ulysses has to be bothered with licensing, and the repository can be smart enough to put the correct rights information in each item. For third-party deposit, Ulysses might have to indicate which author is to be considered the licensing author—but honestly, the repository ought to be smart enough to check the author list against ToS signatories, and let the deposit go through if even one author has signed.

(The repository must also allow for un-signing of the ToS; the obvious use-case is a faculty member leaving the institution. This cannot be allowed to affect previously-deposited items, but it should halt deposit of future items that depend on that faculty member’s consent.)

DSpace’s other major problem is its unwieldy workflow system, which doesn’t really cover the use-case just expressed. DSpace doesn’t even kick an item into the workflow system until it’s been uploaded and all the metadata is complete, which is exactly bass-ackwards from what Cassandra and Ulysses need it to do. Ulysses needs a staging area where Cassandra’s CMS can dump stuff until he can get to it. DSpace doesn’t have that.

There are other minor nits. The reject notice for an item that doesn’t pass a rights check needs to go to the corresponding author(s), not to Ulysses. DSpace needs an external-facing notification mechanism more sophisticated than email. Ulysses needs to be able to shove Basketology off on Menelaus, once everything is running right and Menelaus has learned how to check rights—Ulysses can’t possibly handle the entire campus doing as Cassandra has done, because there aren’t enough hours in the day. All of this is solvable, some of it trivially.

So. Let’s get to work? I hear there’s a repository-programming event coming up…