Warning: fopen(/home/.lasher/yarinare/cavlec.yarinareth.net/wp-content/cache/) [function.fopen]: failed to open stream: Is a directory in /home/.lasher/yarinare/cavlec.yarinareth.net/wp-content/plugins/wp-cache/wp-cache-phase2.php on line 96
Caveat Lector » DSpace

Dies Martis, 6 Maii 2008

Courtesy

A courteous interface is a marvelous thing. It gets out of the way. It intuits what you want, squeezing every tiny bit of information possible out of whatever tidbits you feed it. It doesn’t bother you with its nasty little internal troubles. It’s Jeeves, there with a pick-me-up when you’ve got a drink-fueled headache.

DSpace’s administrative and item-submission interfaces are more like the temporary Jeeves replacement Bertie got stuck with once, the guy who snarled all the time and snaffled socks. It is about as courteous as a New York cabdriver in heavy traffic. As a result, it wastes incredible amounts of human time—my time, my sysadmin’s time, my submitters’ time, the time of dozens of admins just like me. I promised to talk about that, so I will.

For example. Just this morning I got an unhappy email from a submitter who didn’t have access to all the collections in a given community. The said collections are two or three levels deep because of intervening subcommunities—and while I’m talking about wasted time, I’ll spend a few words on wasted cognitive capacity, because I have yet to meet anyone for whom the DSpace distinction between communities and collections is intuitive or useful. My submitters expect to be able to submit items to communities. They do not understand why some items on the sitemap (which is how they think of the communities-and-collections page) are bold and others aren’t. I hate wasting time and effort explaining this stupid and essentially otiose distinction.

Right. Back to my submitter and her problem. I had to click open every single collection in order to click again to check its submitter list. For those collections she didn’t have submit access to, adding it was a four-click process and could have been more: click to open the eperson list, click to go to the last page, click to select her address (she’s late in the alphabet), click to update the submitter group. Wasted. Time.

And don’t get me started on DSpace’s repo-rat–hostile habit of building impenetrable names for otherwise-unnamed submitter groups. COLLECTION_27_SUBMIT. Yeah, that makes all kinds of sense in my little rat brain, how about yours? (If you’re wondering, the number is the collection’s database identifier, which is almost impossible to figure out from the DSpace UI. Real friendly, DSpace.) And these names proliferate like rats, because there’s no way to tell DSpace “use the people I just told you about, plzkthx” without going through the added hassle of creating and naming an actual group, and no way to tell DSpace “use the standard access rules for this community” or “use the access rules for this other collection.”

So then I needed to set up a new collection for her. Could DSpace pick up on the submitter-selection work I’d already wasted a bunch of time doing? Could it hell. I had to go through the same clickety-clickety process all over again. There’s no access templating in DSpace; every single collection in every single community is sui generis. Just imagine how much time I get to waste when someone leaves the university and someone else takes over their DSpace deposit duties! Woo-hoo! Because obviously I don’t have anything important to do with my time.

Which brings us to the DSpace deposit interface. To be clear, I’m working from 1.4.2 here, not 1.5—but let’s be clear about something else too, namely that 1.5 doesn’t fix all of these warts, though the Configurable Submission system is indeed a step forward. So let’s waste some time, everybody!

You start your submission from a collection page, or you start from My DSpace, in which case it asks you to pick a collection. What does it do with this collection information? It determines whether you have deposit access, duh, and if your friendly neighborhood repository-rat has spent time customizing a metadata form for that collection, it uses that form. (Does DSpace ask on collection creation which metadata forms to use? It does not. That’s configured via a file called input-forms.xml on the server. Mm-hm, that’s right, I have nothing better to do with my time than seek out and edit—twice, because I keep a version in source control—bitsy little XML files DSpace leaves all over creation.) Anything else? Like surveying existing items in that collection for commonalities in order to prepopulate metadata fields? Nah. Machine learning would save a human being’s time or something. Can’t have that.

Next you run into this screen, which I loathe with a white-hot loathing neutron stars might envy:

First DSpace submission screen

The top question is just goofy. In my experience, this is true for less than one-tenth of one percent of submissions. The Québécois might have a use for that checkbox, but how many DSpace installations does Québéc have exactly, and why exactly wouldn’t a Québécois installation just put in dc.title.alternative by default? So why is every submitter into every DSpace installation forced to cope with that moronic checkbox for every single submission? Because DSpace doesn’t give a tinker’s damn about anybody’s time or cognitive load, that’s why. The default is correct, at least, but that’s decidedly small comfort.

(I suspect there’s a librarian at the bottom of this interface wart somewhere. What about MARC 246, someone must have screamed. Guess what? I don’t care about MARC 246. I care about efficient use of person-hours, which that checkbox unquestionably isn’t. I love my fellow librarians, except when I hate them. I hate them when they gleefully glomp every iota of patron time and effort they can get their little mitts on.)

The middle question is difficult to understand (for my submitters, anyway; more of them get it wrong than right), and DSpace doesn’t explain why you have to answer it. I get a lot of questions from submitters about putting in publication dates and citations, because my submitters don’t mentally connect those fields with that checkbox. But that’s what that checkbox does when checked: it adds fields to the next metadata screen for dc.date.issued, dc.publisher, and dc.identifier.citation. (How many repository-rats running DSpace just learned something? Don’t be embarrassed. It was months before I figured it out, too, and I had to go in and read code before I had it sussed.)

But it gets better (for “worse” values of “better”). Imagine Ulysses Acqua for a moment, trying to be nice to Dr. Troia and the little open-access basketology journal she wants to archive. He uses the input-forms.xml file to make a custom metadata form that puts basic citation information for the basketology journal in dc.identifier.citation so Dr. Troia doesn’t have to retype it every time. When Dr. Troia submits her first article, she doesn’t think to tick the middle checkbox, and DSpace doesn’t tick it for her. What happens?

SHE GETS AN ERROR MESSAGE. I kid you not. AN ERROR MESSAGE. It reads “You’ve indicated that your submission has not been published or publicly distributed before, but you’ve already entered an issue date, publisher and/or citation. If you proceed, this information will be removed, and DSpace will assign an issue date.”

I—I—I honestly have no words. Do I need them? Maybe I do. The Jeeves interface never, ever, EVER threatens to discard information Bertie has provided it. It’s hard enough to pry useful information out of Bertie as it is! And talk about your bizarrely opaque, unhelpful, and inappropriately finger-wagging error messages! (How does Dr. Troia fix the problem, if she wants to keep her citation information or date or whatever? The message doesn’t even say.) I am just agog that this grotesque interaction exists in a production software system.

(Yes, of course I’ve triggered it. How do you think I figured out it exists? I don’t go looking for smelly garbage like this, I assure you.)

But it even gets worse than that. Weird interactions between input-forms.xml and the deposit code can make checkboxes on this page disappear when they shouldn’t. I haven’t dug into how this happens—but it bit me hard, such that I had to be unhelpful and take a date.issued out of a thesis metadata form in input-forms.xml. Because hey, troubleshooting DSpace’s sclerotic deposit system is such a productive use of my time!

Returning to our initial screen once more: there is absolutely no need whatever to ask the submitter about multiple files. None. Simply assume that submissions may have more than one file! Asking submitters to think about it up-front instead of at upload is wasted time.

So there we have it. An entire wasted screen, multiplied by untold numbers of DSpace submissions. There’s plenty more in there, the licensing system not least; Jeeves interface, not so much.

EPrints, as a rule, is a much better gentleperson’s personal gentleperson than DSpace. EPrints, for example, asks for item type up front, and configures its deposit screens to match, without the intervention of either submitter or repository-rat. Who knows, this politeness may have something to do with developer attitude. The last time I waxed profane on matters repository-interface-ish, Les Carr was in my inbox less than a day later asking eagerly, “is this what you mean? would this solution I just came up with work for you?” Whereas DSpace gets on my case for being negative. I’m just sayin’ here.

No. No, I’m not just sayin’. It runs deeper than that. I’ve occasionally seen a few nods in the DSpace developer community toward EPrints interface accomplishments. Unfortunately, the feel of the discourse I’ve seen is “look at all the shiny AJAX! we want that!”

This is not about shiny AJAX, people. It’s not about shiny at all. This is about DSpace not wasting my time. There’s a ton of work DSpace could do with the aim of removing time-wasters before anyone writes a single line of Javascript or de-uglifies a single line of CSS. To do so, though, DSpace developers will have to learn to give a damn about my time and the amount of it DSpace has wasted and continues to waste. I see next to zero evidence of that learning taking place. (Tim gets it, which is why I say “next to zero” rather than just plain zero.)

Stop. Wasting. My. Time. That’s far and away the most important interface-development priority DSpace should adopt. For values of “me” that include “all repository-rats and willing depositors,” of course. DSpace’s interface needs to sit down at its mama’s knee and learn some courtesy.

Dies Jovis, 13 Martii 2008

Search scope in consortial repositories

If you run a consortial repository, one of the things Manakin brings you is the possibility of separating each institution in your repository from the others visually, such that each institution practically seems to have its own site!

Manakin is actually pretty careful about the URL design of its scoped browsing. If you start browsing inside a particular community or collection, you’ll still see that community or collection’s design (as opposed to the default), because the URL hangs onto the handle, which is what the theme chooser uses to decide which theme to display. Very smart!

Scoped searching, however, is a different and rather nastier problem. Out of the box, Manakin’s search box is designed to allow the user to choose between two types of search: the entire repository, and the currently-browsed community or collection. This is a problem for consortial repositories who want their institution-level communities to seem wholly independent of each other. There shouldn’t be any all-of-DSpace search available from a community’s page. The “all of DSpace” scope should be replaced by an “all of the institution’s community” scope.

(I initially thought there shouldn’t be any broad-scope search at all. This was completely wrongheaded of me. If you’re in a departmental collection, you should be able to search the entire institution’s collections. I mention this so that you won’t make the same mistake.)

At present, I have solved about half this problem. The half I can’t solve is the search-results page, which uses the site-default theme no matter what I do, and cannot be made to respect the scoping established on the search page. I am annoyed by this, but I’m pretty sure that solving it is beyond my abilities. (What it would take, I suspect, is sticking information about the referring page and its theme into the DRI. Somebody want to write an Aspect to do that?)

However, I have solved the search-box scoping problem. It’s a start. Here’s how you can too.

First, you need to know when you’re on the main community page. For this, you need to record that page’s handle in the theme’s XSLT. This got slightly hairy for me because my test and production servers have different handle prefixes. If yours don’t, your solution is easier than mine. Anyway, here’s mine (and yes, I’m giving away the farm here a bit, revealing which community I’m doing this for, but I don’t see that that’s a problem):

<xsl:variable name="handle-prefix" select="substring-after(/dri:document/dri:meta/dri:repositoryMeta/dri:repository/@repositoryIdentifier, 'hdl:')"/>
<xsl:variable name="uwmad-handle">
    <xsl:choose>
        <xsl:when test="$handle-prefix='1960'">1960/10498</xsl:when>
        <xsl:when test="$handle-prefix='1793'">1793/8334</xsl:when>
    </xsl:choose>
</xsl:variable>

Now you need to mess with the radio buttons in the search form. On your community’s main page, you’ll replace them with a hidden input containing that community’s handle as scope. Everywhere else, you’ll sneakily change the “everything” scope to search just your community. Take a deep breath — a lot of code here:

<xsl:choose>
    <!-- when we're on the UW-Madison home page, don't offer a choice of scope -DS -->
    <xsl:when
        test="/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container']=concat(’hdl:’,$uwmad-handle)”>
        <input id=”ds-search-form-scope-container” name=”scope” type=”hidden”>
            <xsl:attribute name=”value”>
                <xsl:value-of select=”$uwmad-handle”/>
            </xsl:attribute>
        </input>
    </xsl:when>
    <xsl:otherwise>
        <label>
            <!– edited so that a scope of “all” ONLY searches UW-Madison stuff –>
            <input id=”ds-search-form-scope-all” type=”radio” name=”scope”
                checked=”checked”>
                <xsl:attribute name=”value”>
                    <xsl:value-of select=”$uwmad-handle”/>
                </xsl:attribute>
            </input>
            <i18n:text>All of MINDS@UW-Madison</i18n:text>
        </label>
        <br/>
        <label>
            <input id=”ds-search-form-scope-container” type=”radio” name=”scope”>
                <xsl:attribute name=”value”>
                    <xsl:value-of
                        select=”substring-after(/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container'],’:')”
                    />
                </xsl:attribute>
            </input>
            <xsl:choose>
                <xsl:when
                    test=”/dri:document/dri:meta/dri:objectMeta/dri:object[@objectIdentifier=/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container']]/mets:METS/mets:structMap[@TYPE='LOGICAL']/mets:div[@TYPE='DSpace Collection']”
                    >This Collection</xsl:when>
                <xsl:when
                    test=”/dri:document/dri:meta/dri:objectMeta/dri:object[@objectIdentifier=/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='focus'][@qualifier='container']]/mets:METS/mets:structMap[@TYPE='LOGICAL']/mets:div[@TYPE='DSpace Community']”
                    >This Community</xsl:when>
            </xsl:choose>
        </label>
    </xsl:otherwise>
</xsl:choose>

As best I can tell, this works quietly and without fuss. Best kind of hack!

Dies Lunae, 3 Martii 2008

Useful things

Michael W. Carroll’s whitepaper on the NIH policy is a useful document for all repository-rats, not just those who have NIH grantees to worry about. Lovely tidbits in there on various aspects of copyright vis-a-vis scholarly communication. Recommended.

DSpace finally, finally, finally has an up-to-date list of vendors. I can’t speak to how good any of them are (though I’d trust some of the named individuals implicitly based on what I’ve seen of them on the DSpace lists), but just having the list is a vast improvement over the previous situation. Good job, DSpace Foundation!

I’ve mentioned before the excellence of scholarly-publishing executive Mike Rossner, and he’s gone and done it again. Right or wrong, it takes a special sort of courage to break ranks and call out your own kind in an extremely fraught conflict. Rossner’s letter is useful as an anti-FUD device.

For those of us hoping for motion on an initiative similar to Harvard’s, this month’s SPARC Open Access Newsletter is a must-read. Amid the straightforward history and lucid analysis are tantalizing tidbits about how it was done. “Enlist Peter Suber” sounds like good strategy to this rat!

Dies Martis, 5 Februarii 2008

Argh, Manakin, don’t do that!

Manakin has one seriously broken behavior that I can’t figure out how to fix, but there’s a workaround that I recommend to everyone.

If you try to click on a link to a page that needs you to be logged in, Manakin duly asks you to log in and then shoots you over to your page. Fine. If, however, you click a login link from an open page, once you’re logged in, Manakin returns you to the root (main) page of your repository.

Let me count the ways in which this is broken:

  1. It doesn’t clearly inform you that you’ve logged in properly. When I first ran into this, I wondered if I had!
  2. If you came from a page with a different theme (visual design) from the main repository page, welcome to total confusion!
  3. It doesn’t take you to a page that does anything useful with your logged-in status. If you’re coming from an open page, you probably logged in to deposit an item or handle something in your workflow. To do this from the main page, you’re stuck clicking at least once, and probably twice!
  4. It’s not what the JSP UI does. The JSP UI sensibly sends you to your profile (”My DSpace”) page. Switching to Manakin and not fixing this behavior will confuse every single existing user of your repository. (How did TAMU not find this out on user testing? Didn’t they test? Or do they not have any existing JSP UI users?)

My recommendation is never to link to Manakin’s login page in a theme. Instead, link to Manakin’s profile page (”/profile” instead of “/login”). You can still label it “Log in” if you like. This way, Manakin will say “oops! can’t go to your profile page if you’re not logged in!” It will then log you in and send you to your profile page. Which is correct, not-confusing behavior.

Edited to add: To make this happen by default, go to Navigation.java in the EPerson aspect of Manakin. Find “/login” and change it to “/profile”. Install and rebuild. You’re done.

Dies Mercurii, 30 Ianuarii 2008

Solving Cassandra’s problems

Now that I have personas, I can actually talk about repository design.

Well, sort of. I do want to talk out-of-band about a couple of the personas first. Les Carr pointed out to me in email that Ulysses is a bit of a luxury item; a lot of libraries combine him with Menelaus. This is absolutely true, and I was considering building a persona on a satellite campus to reflect that reality. However, the “maverick manager” model exists too (I’m one, and the model applies broadly to consortial repositories), and my suspicion is that the more complex solutions that work for the Ulysses/Menelaus division will also work for institutions where Ulysses and Menelaus are the same person. So for now I’m going to stick with what I’ve got, reserving the right to revisit that decision later.

Something else to notice about Ulysses is that he is also a stand-in for libraries that outsource the repository IT to vendors. They’ll face many of the same functionality and responsiveness problems that poor Ulysses does.

If you’ll recall, Cassandra Athens wants to use Achaea’s repository for two things: to export the problem of potentially-copyright-violating faculty postings to Ulysses, and to form automatically-generated CVs for faculty websites. Let’s talk about the former problem first.

Since Cassandra isn’t stupid, she has probably set up her CMS to have an upload area, disallowing any other way for faculty to put random files up on the Basketology website. (And Dr. Troia and others probably howled about it, but so it goes.) Faculty have two options: they can use Cassandra’s uploader, which Cassandra would vastly prefer they do, or they can put their files someplace else and link to them from the Basketology pages they control, which lands Cassandra right back in the quagmire she’s trying to escape from.

Some social engineering will be required here; Cassandra may well have to go to the chair of Basketology and make a stink about copyright liability. She’ll be much happier about doing that, though, and the chair much happier about dealing with the problem, if Cassandra has a workable alternative to propose.

The design goal, therefore, is to have Cassandra’s CMS talk to Ulysses’s repository such that faculty find it easier and safer to use Cassandra’s upload mechanism than to bypass it—without adding significantly to Cassandra’s ongoing workload. (Ulysses has time to throw at this; except for a few up-front development and testing cycles, Cassandra doesn’t.)

SWORD!” you may be yelling at me at this point. No. SWORD won’t work, because SWORD more or less assumes you have a nice tidy well-described object to swap around. At most, Cassandra can squeeze a file and a (badly-formatted unpredictable text) citation out of faculty; sometimes they won’t even bother pasting in the citation. Moreover, SWORD is a difficult target for Cassandra to program against, and her CMS doesn’t natively deal with it.

She needs to be able to tell her CMS to email or FTP the file, the CMS’s identifier for the file, and what little information it has about the file somewhere; her CMS needs to receive the item’s handle or other permanent identifying URL in return along with the CMS’s identifier, so that she and/or faculty can link to the item in the repository easily.

This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Cassandra, it doesn’t work at all. (You think I’m draconian about this? Read Alan Cooper.)

Ulysses needs the repository to receive the CMS’s email or watch the FTP folder it has been told to watch, and to notify him that Cassandra’s CMS has fired a file at him. He then rights-checks the submission, applies metadata liberally, arranges for licensing, and (assuming that all checks out) sends the item live, whereupon the repository knows to notify Cassandra’s CMS. Ideally, the repository would even construct and shoot back a pretty, properly-formatted HTML citation! None of this can involve Ulysses interacting directly with the repository server, because Ulysses isn’t allowed to do that.

I repeat: This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Ulysses, it doesn’t work at all. (Seriously. The Inmates are Running the Asylum. He means you, software developers.)

A few DSpace-specific notes. DSpace’s per-item licensing paradigm is all wrong, and I suspect it shares this problem with EPrints, because the root of the difficulty is the OAIS model, in which rights information must be tightly associated with the object to which the rights pertain. This is great for the OAIS model, in which happy little computers talk to other happy little computers, but it’s a disaster for any kind of ongoing interaction between actual people and the repository, as Ulysses’s paper-license insanity illustrates. (That, by the way, was drawn directly from a real-world situation. I refuse to point fingers; you’ll have to take my word for it.)

A Terms of Service agreement is a much human-friendlier solution; Dr. Troia can be told to go click through it, and after she does, neither she nor Ulysses has to be bothered with licensing, and the repository can be smart enough to put the correct rights information in each item. For third-party deposit, Ulysses might have to indicate which author is to be considered the licensing author—but honestly, the repository ought to be smart enough to check the author list against ToS signatories, and let the deposit go through if even one author has signed.

(The repository must also allow for un-signing of the ToS; the obvious use-case is a faculty member leaving the institution. This cannot be allowed to affect previously-deposited items, but it should halt deposit of future items that depend on that faculty member’s consent.)

DSpace’s other major problem is its unwieldy workflow system, which doesn’t really cover the use-case just expressed. DSpace doesn’t even kick an item into the workflow system until it’s been uploaded and all the metadata is complete, which is exactly bass-ackwards from what Cassandra and Ulysses need it to do. Ulysses needs a staging area where Cassandra’s CMS can dump stuff until he can get to it. DSpace doesn’t have that.

There are other minor nits. The reject notice for an item that doesn’t pass a rights check needs to go to the corresponding author(s), not to Ulysses. DSpace needs an external-facing notification mechanism more sophisticated than email. Ulysses needs to be able to shove Basketology off on Menelaus, once everything is running right and Menelaus has learned how to check rights—Ulysses can’t possibly handle the entire campus doing as Cassandra has done, because there aren’t enough hours in the day. All of this is solvable, some of it trivially.

So. Let’s get to work? I hear there’s a repository-programming event coming up…

Dies Jovis, 17 Ianuarii 2008

Theming different parts and pages in Manakin

I’m sure everyone else figured this out already and I’m the only one who didn’t, but just in case someone else is as slow on the uptake as I am…

You set which pages get which theme in Manakin via [dspace]/config/xmlui.xconf. Each theme gets a theme element with its name, the path to it, and… a selection regex! REGEX! Pattern-matching!

This means you can set up a theme just to hit certain pages or sections of the site, as long as they have a distinctive, non-handle-based URL. Want a theme just for the admin section? Easy-peasy. Do regex=".*/admin/.*". How cool is that?

Unfortunately, this coolness breaks down with regard to distinctive community and collection pages, because those have handles and so can’t be caught via regex, not to mention that Manakin is set up to cascade a theme down to item pages. This is irksome, because after all, community and collection pages are (after a fashion) home pages, and as such may well want to look or behave a bit differently from item or browse pages. To some extent, Manakin caters to this; the innermost content on a community/collection page is in its own template.

However, if you want to customize the header or the navbar or anything on a community or collection page, you’re sunk—except you’re not, because I figured this one out for you. At the top of your theme, add these variable definitions:

<xsl:variable name="is_comm" select="boolean(/dri:document/dri:body/dri:div[@n='community-home'])”/>
<xsl:variable name=”is_coll” select=”boolean(/dri:document/dri:body/dri:div[@n='collection-home'])”/>
<xsl:variable name=”is_item” select=”boolean(/dri:document/dri:body/dri:div[@n='item-view'])”/>

With these, you can do conditional logic anywhere in the stylesheet you need to. E.g. <xsl:if test="$is_comm">. It just works!

Now if I only understood what themes.xmap does and whether I should actually care…

Dies Martis, 15 Ianuarii 2008

Redoing navigation in Manakin

One of the commoner tasks involved in redesigning DSpace is reorganization of or additions to the navigation bar. Manakin does not make this simple, but there are ways to do an end-run around it.

The essential problem is that the elements of the navigation bar are not set at the theme level in XSLT, but at the Aspect level, in Java. (DSpace has always suffered from the arrogant notion that it knows interaction design better than you do. Often it is wrong, but the bad interactions are hard-coded in so deep it’s next to impossible to jettison them.)

If you choose, you can go into aspects/ArtifactBrowser/src/org/dspace/app/xmlui/artifactbrowser/Navigation.java and mess around in some rather inscrutable code to make changes that affect the entire Manakin installation. I admit to having done this to get rid of DSpace’s completely pointless browse-by-date function. However, I do not recommend this if adding links is what you need to do, and I triply do not recommend it for theme-specific navigation links.

I have now tested my sitemap.xmap hack, and I am pleased to say that it works exactly as I expected it would. For the situation where you want the normal Manakin sidebar, but you also want a few theme-specific additions, it is a decent way to go. After I threw another temper tantrum on the dspace-tech list, we can eventually expect a better way to inject content into Manakin DRI files. Until then, though, hacking sitemap.xmap works.

If you want to rearrange content in the navigation bar, beyond simply changing wording or adding a few links to the end, you have some work ahead of you. This is because the content and order of the sidebar is not set on the theme level; it’s hardcoded into the Java Aspect gizmo. (Is this stupid? Yes, this is stupid. These kind of interaction-design decisions do not belong in Java; they belong with the designers who are not supposed to be using Java. Eventually, however, I think it will be possible to move Manakin in a more productive direction.)

It is possible to work around this. The easy way to do it is to go into the dri:options template and rip out the <xsl:apply-templates> call, replacing it with hard-coded links. I think this is fully justifiable, though it’s rather annoying that (unless you set up theme inheritance somehow) you have to do it for every theme you write.

(Note also that doing it this way makes possible a rather interesting trick: you could actually make a DSpace community or collection a seamless part of Somebody Else’s Website. Grab up their site design and navigation bar to theme the community/collection with, then add a link on both sites that goes directly to the community/collection, and there you are. Nice trick, isn’t it? I really want to try it.)

The hard way to work around Manakin’s hard-coded navigation is to replace the <xsl:apply-templates> call with markup that pulls the appropriate links out of the DRI. What’s really hard about this is that without the <xsl:apply-templates> call, you’ll have to go through and figure out the logged-in-user and administrative linksets as well. I haven’t been quite daring enough to do this yet, but somebody ought to.

Because navigation is too important a part of interaction design to be left to a bunch of developers, yeah? (Sorry. Been rereading Alan Cooper.)

Dies Martis, 8 Ianuarii 2008

Batch-replacing items in DSpace

Something I ought to have mentioned in yesterday’s post is the -t flag. This does a “test run” of your import, catching many (though not all) problems. (It will not notice if something is wrong inside your dublin_core.xml file. If you don’t have a dublin_core.xml file, it will notice.) I always run an import command with -t, then if it runs clean, arrow-up to recall the command, delete the -t, and off it goes.

If an import does happen to choke and die in the middle, don’t panic; running the command again with the -r (for “resume”) flag will pick up the import where it left off.

Right. Now, moving on to the situation where an individual item or every item in a collection is seriously messed up, and would be much faster to correct outside DSpace. This can be done! I have done it. But it’s annoyingly error-prone.

Step one is to export the item or collection. This works a lot like importing. As the DSpace administrator user, go to DSpace’s bin directory and run the following:

  • dsrun org.dspace.app.itemexport.ItemExport Command invocation.
  • --type=COLLECTION Or ITEM, depending on which you’re exporting.
  • --id=0123/4567 The item or collection’s handle.
  • --dest=/home/me/stuff The directory on the server where the exported items should end up. Make sure the DSpace administrator user can write to this directory!
  • --number=1 The exporter names the individual item directories with sequential numbers. Instead of peeking into the directory, finding the highest-numbered existing directory, and adding one (which would be the KIND way to handle this), DSpace insists that you give it a start number.

In toto: dsrun org.dspace.app.itemexport.ItemExport --type=COLLECTION --id=0123/4567 --dest=/home/me/stuff --number=1

Inside each exported item’s directory, you’ll see a “contents” file, a “dublin_core.xml” file, one or more license files, and the bitstreams, all of which should be fairly familiar territory. You will also see a file called “handle” which is a plain-text file containing (surprise!) the item’s handle. At this point you can download all the folders and fix whatever you need to.

To re-import the items without losing their handles or duplicating them, you need to create a mapfile. This is just a plain-text file, with a folder name and the corresponding item’s handle on each line, separated by a space:

1 0123/4567
2 0123/8901
3 0123/2345

The way I do this, since the exporter isn’t smart enough to create a mapfile on its own, is with a little Python hack that runs through a directory of items and associates each item’s “handle” file with its directory name. (I meant to upload my Python hackery yesterday, but either WordPress or Apache was and is being extremely annoying about letting a Python source file load, so hang on while I sort that out—and with any luck I won’t break my blog permalinks this time!)

Now you need to keep DSpace from duplicating metadata on re-import. Yes, DSpace will do this if you let it. One way to deal with this is to run a script called ds-migrate from the bin folder on your items before you batch-import them.

I don’t like this solution, however, despite its being fast and easy. The script is intended for the not-uncommon situation where you mount a collection on a test server and then want to migrate it over to production, leaving no hint whatever that it was ever anywhere else. The script therefore wipes existing provenance and date information—which is bad for a collection you’re exporting from and re-importing to a production server. You’re losing important item history there!

So what I do—and you may well decide differently—is only kill the really troublesome extra metadata out of all the dublin_core.xml files: dc.format.extent, dc.format.mimetype, and dc.identifier.uri.

The first two are easy regular-expression replaces: <dcvalue element="format" qualifier="extent">[^<]+</dcvalue> (you can make the appropriate substitution for dc.format.mimetype without my help, I’m sure!). The last one is a tiny bit trickier, because you only want to get rid of identifier.uri when it’s the DSpace-assigned handle, not when someone has actually entered a different URI. Most people, then, will want this: <dcvalue element="format" qualifier="extent">http://hdl.handle.net/[^<]+</dcvalue> (If you run your own handle server instead of using CNRI’s, substitute its URL, of course.)

The element dc.date.issued causes a slightly subtler problem, in that you may want to keep it if it was DSpace-assigned, but you want to get rid of it if it was user-assigned because it’ll be duplicated. I get rid of it, because DSpace-assigned issue dates are completely meaningless. Your call whether you do too.

I’m told that the event system going into 1.6 is already smart enough to check for duplicated metadata on import. This makes me very happy, because deleting duplicate URIs is a hassle. (Not that I—oh, never mind.)

At any rate, once you’ve taken care of all this, just import as normal, using the mapfile you created as the value for the -m flag and adding the --replace flag. Should work fine.

Dies Lunae, 7 Ianuarii 2008

The DSpace batch importer

A plea came in to the DSpace techlist for how to use the DSpace command-line batch importer. “RTFM!” was the immediate chorus.

Well, okay, it’s how I learned to use the batch importer, but that doesn’t mean everyone should have to learn that way. So forthwith, a nuts-and-bolts minimal-techspeke tutorial on getting stuff into DSpace through the back alley.

First, some vocabulary. A “bitstream” is what you and I, being normal folks, think of as a file. An “item” consists of one or more bitstreams, plus descriptive information (author, title, etc.) about those bitstreams, plus license information. A “bundle” is a DSpace-specific construct (you won’t even see it in the UI, really) that keeps license bitstreams separate from content bitstreams inside an item. An “eperson” is someone registered with the DSpace instance; s/he is usually referred to by his/her email address.

To import an item into DSpace, you need to give DSpace three things: the bitstreams, the item’s descriptive information, and (because DSpace is fairly brain-dead) a plain-text listing of the bitstreams. All these things need to be in a single folder. If you are importing more than one item at once, each item needs to be in its own folder. DSpace does not care how you name the folder or the bitstreams. It does care how you name the bitstream listing, the file containing descriptive information, and the license files if any, as I’ll explain in a moment.

License information is optional. If you do not provide it, DSpace simply doesn’t attach a license to the imported item. If you do provide a license for the item, it should be in the form of a plain-text file inside the item’s folder named “license.txt.” (I’m leaving Creative Commons licenses out of the picture for now; if you care, I have another post on the subject which you should read only after you read and understand this one.)

The plain-text listing of the bitstreams needs to be named “contents”. Each filename should be on its own line; order is irrelevant. If you are only importing content files (no license files), you’re done. If, however, you have license files, you need to tell DSpace to put them in a different bundle from the content files. Easier to demonstrate than explain:

contentfile1.txt    bundle:ORIGINAL
contentfile2.txt    bundle:ORIGINAL
license.txt    bundle:LICENSE

The whitespace between the filename and the bundle name must be a single tab character.

The descriptive information lives in a little XML file whose name must be “dublin_core.xml.” To keep this post to a manageable length, I am not going to go heavily into detail about Dublin Core metadata; the easiest way to bootstrap yourself is to look at existing items in a repository in full-listing view. A bare-bones dublin_core.xml file looks something like this:

<dublin_core>
    <dcvalue element="contributor" qualifier="author">Public, John Q.</dcvalue>
    <dcvalue element="language" qualifier="iso">en</dcvalue>
    <dcvalue element="subject" qualifier="none">Technology</dcvalue>
    <dcvalue element="title" qualifier="none">Sample Dublin Core record</dcvalue>
    <dcvalue element="type" qualifier="none">Article</dcvalue>
</dublin_core>

The order in which you place individual Dublin Core elements generally does not matter, although you should put authors in the correct order (first author first, second author second, etc.) because DSpace does respect that order, and if you don’t angry faculty will come after you with long knives.

If you have all this together, you are now ready to use the batch importer. Put the item folder on the DSpace server somewhere that the DSpace administrator user has read and write privileges. As the DSpace administrator user, cd over to the bin folder inside the running DSpace instance (note: not the source-code folder that you run ant from when you recompile DSpace). I’m going to run through the command one bit at a time, and then put it all together at the end.

  • dsrun org.dspace.app.itemimport.ItemImport Command invocation.
  • -a Tells DSpace that you’re adding new items.
  • -e me@myu.edu Eperson who should be held responsible for the submitted items. This need not necessarily be you! It does need to be someone the system knows about, so if you’re depositing on behalf of someone who’s never used the system, you need to use the DSpace administrative interface to add them as an eperson.
  • -c 0123/4567 Which collection the items should go into. Go to the collection’s home page and grab up its handle. (Note that the batch importer is deadly stupid about this; there is no way to do a single batch import of items that belong to different collections. Also, you can’t map items into additional collections via the batch importer.)
  • -s /home/me/stuff The directory on the server where the item folders are. DSpace will error out if the admin user does not have read access to this directory!
  • -m /home/me/stuff/mapfiles/mapfile.txt Where to put the dumb little “map file” that DSpace generates, telling you which item got assigned which handle. This is basically a throwaway (it’s easy to regenerate if you ever actually need it), but if you don’t let DSpace generate its dumb little map file, DSpace sulks and won’t import your items.

So, the full command looks something like this:

dsrun org.dspace.app.itemimport.ItemImport -a -e me@myu.edu -c 0123/4567 -s /home/me/stuff -m /home/me/stuff/mapfiles/mapfile.txt

For most items, you are now done. If your item was a website, you have one more step: setting the “primary bitstream” to the website’s home or entry page. Anyone with edit rights on the item can do this from the item’s edit page; there’s a column of radio buttons labeled “Primary bitstream?” beside the bitstream listings near the bottom. Alternately, you can employ some SQL-fu in your database (instructions are for Postgres, not Oracle).

The batch importer can also replace items, so if you’ve completely hosed a collection in some reasonably fixable fashion, you can export it, fix it, and re-import it. Danger Will Robinson! There are several gotchas in this process. (Not that I know this by experience or anything—okay, I’m not fooling anyone here. I’ve run into all of them.) For this, you will need a mapfile, and you need to add --replace to the command line. I’ll reserve the other gotchas for a separate post, noting only that I have it on good authority that several of them will be going away in version 1.5 or 1.6.

And there you are. I hope.

Dies Veneris, 28 Decembri 2007

Kludging Manakin: IncludePageMeta

So I’m about to start wrangling my new DSpace/Manakin theme into shape in Internet Explorer, as you might have gathered from yesterday’s howl of anguish, and it occurred to me to wonder how to alter the stylesheet setup in Manakin to take note of more versions of IE. (Out of the box, it understands “IE” and “IE6.” I am wondering about IE5. Anything previous to that can jump off a cliff and die horribly.)

What I found was actually a limited but relatively simple way to add static information into a Manakin theme without mucking around in Java and whatnot. Note well, it’s a big fat ugly nasty kludge—but it’ll work, and future versions are highly unlikely to break it.

Open up a sitemap.xmap file. Look for the “Step 2″ comment, which introduces some map:transforms based on browser type. That’s your loophole, right there; the information there is going straight into the DRI, in /document/meta/pageMeta/metadata elements. (I’ve left out the namespaces, but you can’t in your XSLT. All the above are in the DRI namespace.)

So let’s take a closer look. The metadata elements in the DRI are structured more or less like good old familiar Dublin Core. There’s an element attribute and a qualifier attribute, and the value is there as the element’s content, so:

<metadata element="stylesheet" qualifier="screen">style.css</metadata>

How does the sitemap.xmap file make that happen in the DRI? Thusly:

<map:parameter name="stylesheet.screen" value="style.css"/>

See? Simple. But as I said, limited—this isn’t where you’re going to be able to put your entire static Help pages or FAQ. But you could, for example, introduce a new navigation bar or the like without having to hack the living hell out of Navigation.java in the ArtifactBrowser Aspect (which is frankly what I did, though it’s a lousy idea because that Aspect governs the entire system, not just one theme, so I’ll probably be replacing that hack with something based on this).

Now, if you’ve been paying close attention to that sitemap.xmap file, you’ll have noticed that all the code I’ve been referring to is inside some conditional stuff (map:select and map:when). I must say I haven’t tried this yet, but I think the way to just plain old add some stuff is to go outside the map:select element altogether and do something like this:

<map:transform type="IncludePageMeta">
  <map:parameter name="newElement.newQualifier" value="newValue"/>
</map:transform>

It should Just Work, showing up in your DRI where you can grab it via XSLT for whatever nefarious purpose you have in mind.

Kludge at your own risk, as always… but as kludges go, I think this one’s fairly safe and harmless.

Next Page »
120c make motorola ringtonemotorola p935 ringtone timeportringtone creator