‘DSpace’ Archive

30 Ianuarii 2008

Solving Cassandra’s problems

Now that I have personas, I can actually talk about repository design.

Well, sort of. I do want to talk out-of-band about a couple of the personas first. Les Carr pointed out to me in email that Ulysses is a bit of a luxury item; a lot of libraries combine him with Menelaus. This is absolutely true, and I was considering building a persona on a satellite campus to reflect that reality. However, the “maverick manager” model exists too (I’m one, and the model applies broadly to consortial repositories), and my suspicion is that the more complex solutions that work for the Ulysses/Menelaus division will also work for institutions where Ulysses and Menelaus are the same person. So for now I’m going to stick with what I’ve got, reserving the right to revisit that decision later.

Something else to notice about Ulysses is that he is also a stand-in for libraries that outsource the repository IT to vendors. They’ll face many of the same functionality and responsiveness problems that poor Ulysses does.

If you’ll recall, Cassandra Athens wants to use Achaea’s repository for two things: to export the problem of potentially-copyright-violating faculty postings to Ulysses, and to form automatically-generated CVs for faculty websites. Let’s talk about the former problem first.

Since Cassandra isn’t stupid, she has probably set up her CMS to have an upload area, disallowing any other way for faculty to put random files up on the Basketology website. (And Dr. Troia and others probably howled about it, but so it goes.) Faculty have two options: they can use Cassandra’s uploader, which Cassandra would vastly prefer they do, or they can put their files someplace else and link to them from the Basketology pages they control, which lands Cassandra right back in the quagmire she’s trying to escape from.

Some social engineering will be required here; Cassandra may well have to go to the chair of Basketology and make a stink about copyright liability. She’ll be much happier about doing that, though, and the chair much happier about dealing with the problem, if Cassandra has a workable alternative to propose.

The design goal, therefore, is to have Cassandra’s CMS talk to Ulysses’s repository such that faculty find it easier and safer to use Cassandra’s upload mechanism than to bypass it—without adding significantly to Cassandra’s ongoing workload. (Ulysses has time to throw at this; except for a few up-front development and testing cycles, Cassandra doesn’t.)

SWORD!” you may be yelling at me at this point. No. SWORD won’t work, because SWORD more or less assumes you have a nice tidy well-described object to swap around. At most, Cassandra can squeeze a file and a (badly-formatted unpredictable text) citation out of faculty; sometimes they won’t even bother pasting in the citation. Moreover, SWORD is a difficult target for Cassandra to program against, and her CMS doesn’t natively deal with it.

She needs to be able to tell her CMS to email or FTP the file, the CMS’s identifier for the file, and what little information it has about the file somewhere; her CMS needs to receive the item’s handle or other permanent identifying URL in return along with the CMS’s identifier, so that she and/or faculty can link to the item in the repository easily.

This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Cassandra, it doesn’t work at all. (You think I’m draconian about this? Read Alan Cooper.)

Ulysses needs the repository to receive the CMS’s email or watch the FTP folder it has been told to watch, and to notify him that Cassandra’s CMS has fired a file at him. He then rights-checks the submission, applies metadata liberally, arranges for licensing, and (assuming that all checks out) sends the item live, whereupon the repository knows to notify Cassandra’s CMS. Ideally, the repository would even construct and shoot back a pretty, properly-formatted HTML citation! None of this can involve Ulysses interacting directly with the repository server, because Ulysses isn’t allowed to do that.

I repeat: This problem can be solved in several ways. Many people reading this will doubtless come up with better solutions than I could. If you do—no fair changing the design constraints, do you understand me? The design constraints are the whole point of this exercise. If your solution doesn’t work for Ulysses, it doesn’t work at all. (Seriously. The Inmates are Running the Asylum. He means you, software developers.)

A few DSpace-specific notes. DSpace’s per-item licensing paradigm is all wrong, and I suspect it shares this problem with EPrints, because the root of the difficulty is the OAIS model, in which rights information must be tightly associated with the object to which the rights pertain. This is great for the OAIS model, in which happy little computers talk to other happy little computers, but it’s a disaster for any kind of ongoing interaction between actual people and the repository, as Ulysses’s paper-license insanity illustrates. (That, by the way, was drawn directly from a real-world situation. I refuse to point fingers; you’ll have to take my word for it.)

A Terms of Service agreement is a much human-friendlier solution; Dr. Troia can be told to go click through it, and after she does, neither she nor Ulysses has to be bothered with licensing, and the repository can be smart enough to put the correct rights information in each item. For third-party deposit, Ulysses might have to indicate which author is to be considered the licensing author—but honestly, the repository ought to be smart enough to check the author list against ToS signatories, and let the deposit go through if even one author has signed.

(The repository must also allow for un-signing of the ToS; the obvious use-case is a faculty member leaving the institution. This cannot be allowed to affect previously-deposited items, but it should halt deposit of future items that depend on that faculty member’s consent.)

DSpace’s other major problem is its unwieldy workflow system, which doesn’t really cover the use-case just expressed. DSpace doesn’t even kick an item into the workflow system until it’s been uploaded and all the metadata is complete, which is exactly bass-ackwards from what Cassandra and Ulysses need it to do. Ulysses needs a staging area where Cassandra’s CMS can dump stuff until he can get to it. DSpace doesn’t have that.

There are other minor nits. The reject notice for an item that doesn’t pass a rights check needs to go to the corresponding author(s), not to Ulysses. DSpace needs an external-facing notification mechanism more sophisticated than email. Ulysses needs to be able to shove Basketology off on Menelaus, once everything is running right and Menelaus has learned how to check rights—Ulysses can’t possibly handle the entire campus doing as Cassandra has done, because there aren’t enough hours in the day. All of this is solvable, some of it trivially.

So. Let’s get to work? I hear there’s a repository-programming event coming up…

17 Ianuarii 2008

Theming different parts and pages in Manakin

I’m sure everyone else figured this out already and I’m the only one who didn’t, but just in case someone else is as slow on the uptake as I am…

You set which pages get which theme in Manakin via [dspace]/config/xmlui.xconf. Each theme gets a theme element with its name, the path to it, and… a selection regex! REGEX! Pattern-matching!

This means you can set up a theme just to hit certain pages or sections of the site, as long as they have a distinctive, non-handle-based URL. Want a theme just for the admin section? Easy-peasy. Do regex=".*/admin/.*". How cool is that?

Unfortunately, this coolness breaks down with regard to distinctive community and collection pages, because those have handles and so can’t be caught via regex, not to mention that Manakin is set up to cascade a theme down to item pages. This is irksome, because after all, community and collection pages are (after a fashion) home pages, and as such may well want to look or behave a bit differently from item or browse pages. To some extent, Manakin caters to this; the innermost content on a community/collection page is in its own template.

However, if you want to customize the header or the navbar or anything on a community or collection page, you’re sunk—except you’re not, because I figured this one out for you. At the top of your theme, add these variable definitions:

<xsl:variable name="is_comm" select="boolean(/dri:document/dri:body/dri:div[@n='community-home'])”/>
<xsl:variable name=”is_coll” select=”boolean(/dri:document/dri:body/dri:div[@n='collection-home'])”/>
<xsl:variable name=”is_item” select=”boolean(/dri:document/dri:body/dri:div[@n='item-view'])”/>

With these, you can do conditional logic anywhere in the stylesheet you need to. E.g. <xsl:if test="$is_comm">. It just works!

Now if I only understood what themes.xmap does and whether I should actually care…

15 Ianuarii 2008

Redoing navigation in Manakin

One of the commoner tasks involved in redesigning DSpace is reorganization of or additions to the navigation bar. Manakin does not make this simple, but there are ways to do an end-run around it.

The essential problem is that the elements of the navigation bar are not set at the theme level in XSLT, but at the Aspect level, in Java. (DSpace has always suffered from the arrogant notion that it knows interaction design better than you do. Often it is wrong, but the bad interactions are hard-coded in so deep it’s next to impossible to jettison them.)

If you choose, you can go into aspects/ArtifactBrowser/src/org/dspace/app/xmlui/artifactbrowser/Navigation.java and mess around in some rather inscrutable code to make changes that affect the entire Manakin installation. I admit to having done this to get rid of DSpace’s completely pointless browse-by-date function. However, I do not recommend this if adding links is what you need to do, and I triply do not recommend it for theme-specific navigation links.

I have now tested my sitemap.xmap hack, and I am pleased to say that it works exactly as I expected it would. For the situation where you want the normal Manakin sidebar, but you also want a few theme-specific additions, it is a decent way to go. After I threw another temper tantrum on the dspace-tech list, we can eventually expect a better way to inject content into Manakin DRI files. Until then, though, hacking sitemap.xmap works.

If you want to rearrange content in the navigation bar, beyond simply changing wording or adding a few links to the end, you have some work ahead of you. This is because the content and order of the sidebar is not set on the theme level; it’s hardcoded into the Java Aspect gizmo. (Is this stupid? Yes, this is stupid. These kind of interaction-design decisions do not belong in Java; they belong with the designers who are not supposed to be using Java. Eventually, however, I think it will be possible to move Manakin in a more productive direction.)

It is possible to work around this. The easy way to do it is to go into the dri:options template and rip out the <xsl:apply-templates> call, replacing it with hard-coded links. I think this is fully justifiable, though it’s rather annoying that (unless you set up theme inheritance somehow) you have to do it for every theme you write.

(Note also that doing it this way makes possible a rather interesting trick: you could actually make a DSpace community or collection a seamless part of Somebody Else’s Website. Grab up their site design and navigation bar to theme the community/collection with, then add a link on both sites that goes directly to the community/collection, and there you are. Nice trick, isn’t it? I really want to try it.)

The hard way to work around Manakin’s hard-coded navigation is to replace the <xsl:apply-templates> call with markup that pulls the appropriate links out of the DRI. What’s really hard about this is that without the <xsl:apply-templates> call, you’ll have to go through and figure out the logged-in-user and administrative linksets as well. I haven’t been quite daring enough to do this yet, but somebody ought to.

Because navigation is too important a part of interaction design to be left to a bunch of developers, yeah? (Sorry. Been rereading Alan Cooper.)

8 Ianuarii 2008

Batch-replacing items in DSpace

Something I ought to have mentioned in yesterday’s post is the -t flag. This does a “test run” of your import, catching many (though not all) problems. (It will not notice if something is wrong inside your dublin_core.xml file. If you don’t have a dublin_core.xml file, it will notice.) I always run an import command with -t, then if it runs clean, arrow-up to recall the command, delete the -t, and off it goes.

If an import does happen to choke and die in the middle, don’t panic; running the command again with the -r (for “resume”) flag will pick up the import where it left off.

Right. Now, moving on to the situation where an individual item or every item in a collection is seriously messed up, and would be much faster to correct outside DSpace. This can be done! I have done it. But it’s annoyingly error-prone.

Step one is to export the item or collection. This works a lot like importing. As the DSpace administrator user, go to DSpace’s bin directory and run the following:

  • dsrun org.dspace.app.itemexport.ItemExport Command invocation.
  • --type=COLLECTION Or ITEM, depending on which you’re exporting.
  • --id=0123/4567 The item or collection’s handle.
  • --dest=/home/me/stuff The directory on the server where the exported items should end up. Make sure the DSpace administrator user can write to this directory!
  • --number=1 The exporter names the individual item directories with sequential numbers. Instead of peeking into the directory, finding the highest-numbered existing directory, and adding one (which would be the KIND way to handle this), DSpace insists that you give it a start number.

In toto: dsrun org.dspace.app.itemexport.ItemExport --type=COLLECTION --id=0123/4567 --dest=/home/me/stuff --number=1

Inside each exported item’s directory, you’ll see a “contents” file, a “dublin_core.xml” file, one or more license files, and the bitstreams, all of which should be fairly familiar territory. You will also see a file called “handle” which is a plain-text file containing (surprise!) the item’s handle. At this point you can download all the folders and fix whatever you need to.

To re-import the items without losing their handles or duplicating them, you need to create a mapfile. This is just a plain-text file, with a folder name and the corresponding item’s handle on each line, separated by a space:

1 0123/4567
2 0123/8901
3 0123/2345

The way I do this, since the exporter isn’t smart enough to create a mapfile on its own, is with a little Python hack that runs through a directory of items and associates each item’s “handle” file with its directory name. (I meant to upload my Python hackery yesterday, but either WordPress or Apache was and is being extremely annoying about letting a Python source file load, so hang on while I sort that out—and with any luck I won’t break my blog permalinks this time!)

Now you need to keep DSpace from duplicating metadata on re-import. Yes, DSpace will do this if you let it. One way to deal with this is to run a script called ds-migrate from the bin folder on your items before you batch-import them.

I don’t like this solution, however, despite its being fast and easy. The script is intended for the not-uncommon situation where you mount a collection on a test server and then want to migrate it over to production, leaving no hint whatever that it was ever anywhere else. The script therefore wipes existing provenance and date information—which is bad for a collection you’re exporting from and re-importing to a production server. You’re losing important item history there!

So what I do—and you may well decide differently—is only kill the really troublesome extra metadata out of all the dublin_core.xml files: dc.format.extent, dc.format.mimetype, and dc.identifier.uri.

The first two are easy regular-expression replaces: <dcvalue element="format" qualifier="extent">[^<]+</dcvalue> (you can make the appropriate substitution for dc.format.mimetype without my help, I’m sure!). The last one is a tiny bit trickier, because you only want to get rid of identifier.uri when it’s the DSpace-assigned handle, not when someone has actually entered a different URI. Most people, then, will want this: <dcvalue element="format" qualifier="extent">http://hdl.handle.net/[^<]+</dcvalue> (If you run your own handle server instead of using CNRI’s, substitute its URL, of course.)

The element dc.date.issued causes a slightly subtler problem, in that you may want to keep it if it was DSpace-assigned, but you want to get rid of it if it was user-assigned because it’ll be duplicated. I get rid of it, because DSpace-assigned issue dates are completely meaningless. Your call whether you do too.

I’m told that the event system going into 1.6 is already smart enough to check for duplicated metadata on import. This makes me very happy, because deleting duplicate URIs is a hassle. (Not that I—oh, never mind.)

At any rate, once you’ve taken care of all this, just import as normal, using the mapfile you created as the value for the -m flag and adding the --replace flag. Should work fine.

7 Ianuarii 2008

The DSpace batch importer

A plea came in to the DSpace techlist for how to use the DSpace command-line batch importer. “RTFM!” was the immediate chorus.

Well, okay, it’s how I learned to use the batch importer, but that doesn’t mean everyone should have to learn that way. So forthwith, a nuts-and-bolts minimal-techspeke tutorial on getting stuff into DSpace through the back alley.

First, some vocabulary. A “bitstream” is what you and I, being normal folks, think of as a file. An “item” consists of one or more bitstreams, plus descriptive information (author, title, etc.) about those bitstreams, plus license information. A “bundle” is a DSpace-specific construct (you won’t even see it in the UI, really) that keeps license bitstreams separate from content bitstreams inside an item. An “eperson” is someone registered with the DSpace instance; s/he is usually referred to by his/her email address.

To import an item into DSpace, you need to give DSpace three things: the bitstreams, the item’s descriptive information, and (because DSpace is fairly brain-dead) a plain-text listing of the bitstreams. All these things need to be in a single folder. If you are importing more than one item at once, each item needs to be in its own folder. DSpace does not care how you name the folder or the bitstreams. It does care how you name the bitstream listing, the file containing descriptive information, and the license files if any, as I’ll explain in a moment.

License information is optional. If you do not provide it, DSpace simply doesn’t attach a license to the imported item. If you do provide a license for the item, it should be in the form of a plain-text file inside the item’s folder named “license.txt.” (I’m leaving Creative Commons licenses out of the picture for now; if you care, I have another post on the subject which you should read only after you read and understand this one.)

The plain-text listing of the bitstreams needs to be named “contents”. Each filename should be on its own line; order is irrelevant. If you are only importing content files (no license files), you’re done. If, however, you have license files, you need to tell DSpace to put them in a different bundle from the content files. Easier to demonstrate than explain:

contentfile1.txt    bundle:ORIGINAL
contentfile2.txt    bundle:ORIGINAL
license.txt    bundle:LICENSE

The whitespace between the filename and the bundle name must be a single tab character.

The descriptive information lives in a little XML file whose name must be “dublin_core.xml.” To keep this post to a manageable length, I am not going to go heavily into detail about Dublin Core metadata; the easiest way to bootstrap yourself is to look at existing items in a repository in full-listing view. A bare-bones dublin_core.xml file looks something like this:

<dublin_core>
    <dcvalue element="contributor" qualifier="author">Public, John Q.</dcvalue>
    <dcvalue element="language" qualifier="iso">en</dcvalue>
    <dcvalue element="subject" qualifier="none">Technology</dcvalue>
    <dcvalue element="title" qualifier="none">Sample Dublin Core record</dcvalue>
    <dcvalue element="type" qualifier="none">Article</dcvalue>
</dublin_core>

The order in which you place individual Dublin Core elements generally does not matter, although you should put authors in the correct order (first author first, second author second, etc.) because DSpace does respect that order, and if you don’t angry faculty will come after you with long knives.

If you have all this together, you are now ready to use the batch importer. Put the item folder on the DSpace server somewhere that the DSpace administrator user has read and write privileges. As the DSpace administrator user, cd over to the bin folder inside the running DSpace instance (note: not the source-code folder that you run ant from when you recompile DSpace). I’m going to run through the command one bit at a time, and then put it all together at the end.

  • dsrun org.dspace.app.itemimport.ItemImport Command invocation.
  • -a Tells DSpace that you’re adding new items.
  • -e me@myu.edu Eperson who should be held responsible for the submitted items. This need not necessarily be you! It does need to be someone the system knows about, so if you’re depositing on behalf of someone who’s never used the system, you need to use the DSpace administrative interface to add them as an eperson.
  • -c 0123/4567 Which collection the items should go into. Go to the collection’s home page and grab up its handle. (Note that the batch importer is deadly stupid about this; there is no way to do a single batch import of items that belong to different collections. Also, you can’t map items into additional collections via the batch importer.)
  • -s /home/me/stuff The directory on the server where the item folders are. DSpace will error out if the admin user does not have read access to this directory!
  • -m /home/me/stuff/mapfiles/mapfile.txt Where to put the dumb little “map file” that DSpace generates, telling you which item got assigned which handle. This is basically a throwaway (it’s easy to regenerate if you ever actually need it), but if you don’t let DSpace generate its dumb little map file, DSpace sulks and won’t import your items.

So, the full command looks something like this:

dsrun org.dspace.app.itemimport.ItemImport -a -e me@myu.edu -c 0123/4567 -s /home/me/stuff -m /home/me/stuff/mapfiles/mapfile.txt

For most items, you are now done. If your item was a website, you have one more step: setting the “primary bitstream” to the website’s home or entry page. Anyone with edit rights on the item can do this from the item’s edit page; there’s a column of radio buttons labeled “Primary bitstream?” beside the bitstream listings near the bottom. Alternately, you can employ some SQL-fu in your database (instructions are for Postgres, not Oracle).

The batch importer can also replace items, so if you’ve completely hosed a collection in some reasonably fixable fashion, you can export it, fix it, and re-import it. Danger Will Robinson! There are several gotchas in this process. (Not that I know this by experience or anything—okay, I’m not fooling anyone here. I’ve run into all of them.) For this, you will need a mapfile, and you need to add --replace to the command line. I’ll reserve the other gotchas for a separate post, noting only that I have it on good authority that several of them will be going away in version 1.5 or 1.6.

And there you are. I hope.

28 Decembris 2007

Kludging Manakin: IncludePageMeta

So I’m about to start wrangling my new DSpace/Manakin theme into shape in Internet Explorer, as you might have gathered from yesterday’s howl of anguish, and it occurred to me to wonder how to alter the stylesheet setup in Manakin to take note of more versions of IE. (Out of the box, it understands “IE” and “IE6.” I am wondering about IE5. Anything previous to that can jump off a cliff and die horribly.)

What I found was actually a limited but relatively simple way to add static information into a Manakin theme without mucking around in Java and whatnot. Note well, it’s a big fat ugly nasty kludge—but it’ll work, and future versions are highly unlikely to break it.

Open up a sitemap.xmap file. Look for the “Step 2″ comment, which introduces some map:transforms based on browser type. That’s your loophole, right there; the information there is going straight into the DRI, in /document/meta/pageMeta/metadata elements. (I’ve left out the namespaces, but you can’t in your XSLT. All the above are in the DRI namespace.)

So let’s take a closer look. The metadata elements in the DRI are structured more or less like good old familiar Dublin Core. There’s an element attribute and a qualifier attribute, and the value is there as the element’s content, so:

<metadata element="stylesheet" qualifier="screen">style.css</metadata>

How does the sitemap.xmap file make that happen in the DRI? Thusly:

<map:parameter name="stylesheet.screen" value="style.css"/>

See? Simple. But as I said, limited—this isn’t where you’re going to be able to put your entire static Help pages or FAQ. But you could, for example, introduce a new navigation bar or the like without having to hack the living hell out of Navigation.java in the ArtifactBrowser Aspect (which is frankly what I did, though it’s a lousy idea because that Aspect governs the entire system, not just one theme, so I’ll probably be replacing that hack with something based on this).

Now, if you’ve been paying close attention to that sitemap.xmap file, you’ll have noticed that all the code I’ve been referring to is inside some conditional stuff (map:select and map:when). I must say I haven’t tried this yet, but I think the way to just plain old add some stuff is to go outside the map:select element altogether and do something like this:

<map:transform type="IncludePageMeta">
  <map:parameter name="newElement.newQualifier" value="newValue"/>
</map:transform>

It should Just Work, showing up in your DRI where you can grab it via XSLT for whatever nefarious purpose you have in mind.

Kludge at your own risk, as always… but as kludges go, I think this one’s fairly safe and harmless.

20 Decembris 2007

Moving community and collection logos

I don’t want to admit how long it took me to get this right. Weeks. A small glitch elsewhere in the stylesheet that would occasionally throw a wobbly and occasionally not did not help matters. (Have I mentioned that debugging XSLT server-side is an inordinate pain in the posterior? I haven’t? Well, it is.)

Still. If you hate as much as I do that community and collection logos live inside the community/collection box and not in the h1 of the page the way $DEITY intended, read on.

I actually had to comment out the ds-logo-wrapper bit in DS-METS-1.0-DIM.xsl, because nothing I did in my theme seemed to override it. The other sneaky way to dispense with it is to set it to display:none in your CSS.

For our first trick, we will fossick around in the METS and the DRI to determine whether there’s a logo at all, and whether we’re actually on a community or collection page. Add the following variables to the top level of your theme’s stylesheet (look for the context-path variable and put them beside those):

<!-- Whether the current page has a logo associated with it. -->
    <xsl:variable name="has_logo" select="boolean(/dri:document/dri:meta/dri:objectMeta/dri:object/mets:METS/mets:fileSec/mets:fileGrp[@USE='LOGO'])”/>

<!–  Whether the current page is a community or collection home page. –>
     <xsl:variable name=”is_comm” select=”boolean(/dri:document/dri:body/dri:div[@n='community-home'])”/>
    <xsl:variable name=”is_coll” select=”boolean(/dri:document/dri:body/dri:div[@n='collection-home'])”/>

Next, we will go into the template where we want the logo to live and add this to it:

<xsl:if test="$is_coll or $is_comm">
  <img>
    <xsl:attribute name="src">
      <xsl:value-of select="/dri:document/dri:meta/dri:objectMeta/dri:object/mets:METS/mets:fileSec/mets:fileGrp[@USE='LOGO']/mets:file/mets:FLocat[@LOCTYPE='URL']/@xlink:href”/>
    </xsl:attribute>
    <xsl:attribute name=”class”>logo</xsl:attribute>
    <xsl:attribute name=”id”>commcollogo</xsl:attribute>
    <xsl:choose>
      <xsl:when test=”$is_comm”>
        <xsl:attribute name=”alt”>xmlui.dri2xhtml.METS-1.0.community-logo-alt</xsl:attribute>
        <xsl:attribute name=”attr” namespace=”http://apache.org/cocoon/i18n/2.1″>alt</xsl:attribute>
      </xsl:when>
      <xsl:when test=”$is_coll”>
        <xsl:attribute name=”alt”>xmlui.dri2xhtml.METS-1.0.collection-logo-alt</xsl:attribute>
        <xsl:attribute name=”attr” namespace=”http://apache.org/cocoon/i18n/2.1″>alt</xsl:attribute>
      </xsl:when>
    </xsl:choose>
  </img>
</xsl:if>

There is an additional wrinkle if (as I did) you want the logo to live in the ds-div-head with the community’s name in it. For this, you need to test whether a given head on the page is an h1, or you’ll get the damn logo on every head on the page. (Yep. I made that mistake.)

The fix is relatively easy; just change the xsl:if line above to <xsl:if test="$head_count=1 and ($is_coll or $is_comm)"> and you’re golden.

I hope somebody else uses this. Figuring it out drove me crazy for weeks.

14 Decembris 2007

I been dissed!

So the new DSpace Foundation wants people with plain-vanilla DSpace installations to chime in and say how long it took ’em to get set up.

Plain-vanilla? I said. Who the heck stops at plain vanilla? Why is that even a useful benchmark?

Naughty Dorothea, no biscuit. (Note to SourceForge: friendly URLs for mailing-list threads, plzkthx.) “Given your experience, passion and know how it would be great if you could work with us in a positive fashion,” quoth the Foundation’s head honcho.

I don’t even know what to say about this, it’s so goofy. Except to note that I do believe there’s a bit of a gender issue lurking, as I’ve never seen a male participant called out for anything by anyone except when I’ve done it myself—good old unladylike me.

I’m willing to stand on my record vis-a-vis DSpace: even if we leave all my snark and criticism aside as wholly unproductive (which I dearly hope is arguable!), I’ve couple-three patches, two extremely successful customization workshops (and wikifying the handouts to boot), a number of how-to blog posts, and some mailing-list answers to my credit. That’s all I need to say, really.

30 Novembris 2007

Distinctive “current” navigation links

One of the (sadly few) nice things that DSpace’s JSP interface did was call out the link for the page you happened to be on in the navigation sidebar. The magic was a class attribute on the current page’s link, plus a bit of CSS.

Manakin doesn’t do that out of the box. But it can, and I just spent entirely too much time making it do so. Y’all get the benefit of my cussing streak.

The first trick is to figure out just what the address of the current page is. Go up to the top level of your theme’s XSLT stylesheet and add this:

<xsl:variable name="currentpage">
  <xsl:value-of select="$context-path"/>
  <xsl:text>/</xsl:text>
  <xsl:value-of select="/dri:document/dri:meta/dri:pageMeta/dri:metadata[@element='request' and @qualifier='URI']“/>
</xsl:variable>

(Incidentally, could somebody with more XSLT-fu than I have kindly explain what the difference is between dri:metadata[@element='request' and @qualifier='URI'] and dri:metadata[@element='request'][@qualifier='URI']? I know there is one, because it keeps tripping me up.)

Now you need your <dri:xref> transformation to take notice. Here’s how it works:

<xsl:template match="dri:xref">
  <a>
    <xsl:attribute name="href"><xsl:value-of select="@target"/></xsl:attribute>
      <xsl:if test="($currentpage)=(@target)">
        <xsl:attribute name="class">
          <xsl:text>current</xsl:text>
        </xsl:attribute>
      </xsl:if>
    <xsl:apply-templates />
  </a>
</xsl:template>

And then you may style at will.

This only works for side navbar links. It doesn’t currently work for the alphabet links at the top of browse-by pages, because they’ve got parameters attached. (If I recall correctly, there may be some URL-space rearrangements in current Manakin versions that might fix this.) Happy designing!

Clickable authors and subjects in Manakin

The default Manakin install, just so you know, doesn’t put subject terms on the short item-display page. It’s not hard to add them back, and I recommend it; you’ll see what I did with them in a moment.

Making authors and subjects clickable is a bit trickier. Up-front warning: what I’m about to show you is apparently not in line with the latest version of Manakin, but if you get the idea, making the necessary fixes won’t be hard.

The first problem is that the names need to be URL-encoded or browsers will break amusingly. This leads to the second problem, which is that XSLT 1 doesn’t have a built-in URL encoder. Fortunately, Cocoon does, and you can enable it for your Manakin themes. In your main sitemap.xmap file, add the following just below the root <sitemap:xmap> element:

<map:components>
  <map:transformers>
       <map:transformer name="encodeURL"
src="org.apache.cocoon.transformation.EncodeURLTransformer"/>
  </map:transformers>
</map:components>

Then, between Steps 4 and 5 of the <map:pipeline>, add:

<map:transform type="encodeURL"/>

URL encoding problem solved. (Note: if you mouse over links with this working, they don’t look encoded—that’s okay, everything still works.)

Now you need to go into your theme’s XSLT stylesheet and look for the <xsl:template> with the name “itemSummaryView_DS-METS-1.0-DIM”. If it’s not there, go into DS-METS-1.0-QDC.xsl, find it there, and copy it into your theme’s stylesheet.

After breaking things amusingly several times, I found out what works. Note carefully that I am not using Manakin-default table markup for metadata, because I despise table markup. I’m using definition lists instead, and I’ve made them look like tables with CSS. (Hell, my metadata display is prettier than WorldCat’s. Right-justify your labels, people! It helps the eye.)

<xsl:if test="$data/dim:field[@element='subject']“>
<dt><xsl:text>Subject(s):</xsl:text></dt>
<dd>
  <xsl:for-each select=”$data/dim:field[@element='subject']“>
    <a>
      <xsl:attribute name=”href”>
        <xsl:value-of select=”concat($context-path,’/browse-subjects?subject=’)”/>
        <xsl:copy-of select=”text()”/>
      </xsl:attribute>
      <xsl:copy-of select=”text()”/>
    </a>
    <xsl:if test=”count(following-sibling::dim:field[@element='subject']) != 0″>
<xsl:text>; </xsl:text>
        </xsl:if>
  </xsl:for-each>
</dd>
</xsl:if>

Taking that a bit at a time… frankly, you should wrap all your metadata declarations in <xsl:if> statements as I just did, because otherwise they will show up in Manakin whether they actually have values or not! This is just silly.

I put the bare text “Subject(s)” in the code instead of doing something in messages.xml for it. This is bad, it will be fixed, and you should not do it. Use messages.xml instead.

The rest works out to “for each subject, put a link to the corresponding browse-by-subject page, and add a semicolon and space if it’s not the last subject in the list.” It doesn’t take a whole lot of XSLT-fu to see how it works.

This works just as nicely for authors, and I’ve got that enabled too. (I’ve also split out real authors from advisors, translators, editors, etc. in the code. This took a little doing, and may be worth a separate post.) You can do it too—have fun!