12 Septembris 2006

DSpace hack: better browsing from items

Research hath shown that most people who land at an item in a DSpace repository have not followed the flower-strewn garden path to it; they’ve gotten there via a search engine. If the item is to their liking, they are quite likely to ask “What related items does this repository contain?”

An out-of-the-box DSpace item page offers one answer to that question: a link to the collections that contain the item. That’s not a bad answer, not at all; it’s good to offer some context. But it’s not the only possible answer, either. What about more items by this item’s author? What about more items with the same subject keyword?

(Yes, yes, I know all about how horrible repository keywording is. I’m still wading through the subject browse in the repository I run, cleaning things up. I’ve murdered 300-odd subject keywords entirely since I started, many of them by exerting a bit of authority control. I have some choice screenshots of pre-cleanup subject-browse pages for the edification of future generations, or something. Even so, “more items by keyword” is still an appropriate browse choice, especially since not all repositories are slapdash about keywording.)

This can be improved. It’s not the world’s prettiest hack, and it’s tucked away in a bizarre corner of the code, so bear with me here. As always, you hack DSpace at your own risk; I can’t promise that you won’t break it by following my instructions.

You’re looking for the file /src/org/dspace/app/webui/jsptag/ItemTag.java. In that, you want the render() method, which starts with the line private void render() throws IOException. It should be kicking around line 282 somewhere, but don’t quote me on that, as I’ve hacked this file a lot and I don’t know where anything started out any more.

What you’re going to do in broad terms is check each chunk of metadata to see if it’s an author or subject, and if it is, you’ll turn it into a link to the appropriate browse-by page. Fortunately, existing code does similar checks for dates and linkitude, so it’s not too hard to suss out how to do this.

Next, look for the lines that initialize “is this a…” variables:

            boolean isDate = false;
            boolean isLink = false;

Add a couple.

            boolean isAuthor = false;
            boolean isSubject = false;

Now skip down a bit, after the code that switches isLink to true if “link” figures in the name of the metadata field. Add similar code to check for authorness or subjectness:

if (field.indexOf("contributor") > 0 || field.indexOf("creator") > 0)
{
     isAuthor = true;
}

if (field.indexOf("subject") > 0)
{
    isSubject = true;
}

Skip a bit further down to the for loop that goes through each bit of metadata in the list. It starts with the line for (int j = 0; j < values.length; j++). There’s a set of if/else if/else statements starting with if (isLink). We’re going to add some else ifs in the middle of that:

else if (isAuthor)
{
    out.print("<a href=\"" + request.getContextPath()
     + "/items-by-author?author="
     + URLEncoder.encode(values[j].value, “UTF-8″)
     + “”>” + values[j].value
     + “</a>”);
}

else if (isSubject)
{
    out.print(”<a href=”" + request.getContextPath()
    + “/items-by-subject?subject=”
    + URLEncoder.encode(values[j].value, “UTF-8″)
    + “”>” + values[j].value
    + “</a>”);
}

And that should do it. Authors and subjects for an item now link to their appropriate browse-by pages.

I believe, but am not sure, that this increases the density of web-spider crawls of your repository. If this is a problem for you (and it has been for us lately), use robots.txt well or don’t do this hack at all.