To give myself a break from computer hassles, I spent the day getting ready for a meeting tomorrow, writing up talking points for the liaison librarians and looking for MPOW-produced journals. (There’s one very prominent one, a weighty law-review thing, and several smaller ones, one of which is reported to be considering the repository already.) I don’t know that I need to spend a great deal of time sniffing out potential projects; they seem to be coming to me.
And therein lies a problem: namely, DSpace isn’t as good as these projects will want it to be. It does what it does (ingestion, tracking, and bitstream preservation) quite well. It doesn’t do metadata well. It doesn’t do collection-specific user interfaces at all. Its search is so-so. And while it’s now possible to link directly to a bitstream (that’s a “file” for the rest of us) instead of eternally being routed through the bloody metadata, the mechanism is awkward and (in my judgment) brittle, likely to change and thereby break existing uses of it.
(I know, I don’t sound like a librarian when I say things like “bloody metadata.” I’m thinking in terms of user expectation here. If you see a list of article titles from your average database search-results page, and you click on one, what do you expect to see? Hint: NOT THE METADATA. Guess what DSpace shows you, even if you just came in from OAIster or something else that just showed you the very same metadata? Right. Total UI disaster. We librarians, sometimes we love our metadata just a teensy bit too much.)
The real elephant in the closet, though, is what they’re calling “complex objects.” See, DSpace has been implemented to think that each “item” is representable as a single bitstream file. (Jargon, bah. Speaking of which, the first thing that’s leaving MPOW’s DSpace implementation when I redesign it is this “community” business. We don’t have communities at MPOW, nor I daresay at most other universities. We have departments and research programs, how about you? “Community” is fine as an internal catchall placeholder for DSpace developers, but I doubt any actual DSpace implementation should be using the word.)
As I was saying… DSpace copes nicely with the idea that the same essential information can be represented in more than one file format. Got PDF and HTML of the same article text? Not a problem. Same item, two associated files, no big deal. To get all FRBRish on you, DSpace handles differing manifestations of the same work or expression very neatly.
What DSpace chokes on is composite information objects. Take this weblog page as a cheap and easy example. It consists of an HTML file, a CSS file, and a JPEG file (the left-hand sidebar background). There is presently no way to tell DSpace that these are all parts of the same information object and need to be served up together.
I didn’t realize this until my lunch chat with a faculty member last week, when he asked me about archiving whole websites. Hadn’t even thought about it. Thought quickly—and fortunately, I seem to have good instincts, because I said “I’m not sure DSpace can do that terribly well” and I was completely correct.
Okay, so there are workarounds. Sometimes. External CSS can be pulled back into an HTML file. Background JPEGs can be considered “pretty noise” and discarded. When I got back from lunch, though, I was immediately confronted with an HTML journal article containing an information-rich JPEG graph. No discarding that, not no way, not nohow. If you’re not me and you don’t abominate the format, you just print to PDF. Me, I’m considering wrapping it all in a gzip archive and calling that a bitstream. Ugly, but more or less functional, and better for content remixing and reuse.
This problem is actively being worked on. The top contenders for complex-object metadata appear (in my somewhat cursory survey) to be METS and a piece of MPEG-21 called DIDL. (Don’t ask me to spell out the acronyms. Just don’t. Yes, I can, but no, I’m not going to.) MPEG-21 DIDL is apparently cooler and prettier, but it’s also got intellectual-property issues all over the place. If the DSpacers are sensible (and I think they are), they’ll stick with METS.
In the meantime, I’ve got some expectations management to do. And more computer hassles starting tomorrow, when I try to turn my staging server into a clone of the actual live install. It should be as simple as replacing a single folder full of JSP templates and rebuilding DSpace, but heck, I thought getting rid of the port number in the URL would be simple, so what do I know?