Archive for January, 2007

31 Ianuarii 2007

Slugging it out

So DSpace is moving from CVS to Subversion. And I still haven’t rehacked and installed 1.4.1 yet. Sounds like a golden opportunity to get MPOW running on Subversion—call it a parting gift. I was never all that fond of CVS anyway.

I spent much too much time getting Subversion set up and talking to Eclipse today. I believe I now have it together—three branches, one for trunk, one for vendor (so new DSpace releases can get merged in), one for tagged releases. I successfully checked out the trunk in Eclipse at last (I’m honestly too embarrassed to explain the Problem Between Chair And Keyboard that cost me several hours’ frustration; let’s just say “cl00less SVN n00b” and leave it at that), and I’m about ready to start in with the re-hacking, but… ugh.

Ugh, I say. Why does source control have to be so damnably obtuse?

But I did it. I think. Until I find out I got something completely and utterly stupidly wrong. Which will probably happen early tomorrow.

A second look at the Requiem

Believe it or not, I’ve been puzzling over Duruflé’s Requiem since last spring’s epic post about it. I wasn’t happy with my own sense of the piece, was dead sure I was missing something.

I’m starting to get it, I think. It’s taken me a while, and a lot of listening to our concert CD, but I’m starting to get somewhere. It doesn’t hurt that I’ve had to confront death by proxy a time or two since then (though I am happy and relieved to report that my friend’s cancer was caught early, she came through surgery with flying colors, and her prognosis is excellent).

What I was missing, I think, was the stance that the composer has the singers taking toward death. Most Requiems treat singers alternately as Greek chorus and evangelist preacher. If you aren’t scared of the final judgment going home from Verdi, there is something wrong with you; Mozart, too, hammers home the enormity of death and the desperation of the last days. Even when Mozart and Verdi admit grief into the picture, it’s a curiously detached sort of grief, a grief too large to be personal, a Greek-chorus grief. Mozart’s Lacrimosa is beautiful, but it laments everyone’s death, the fact of death, not the mundane reality of individual sorrow.

In Duruflé’s Requiem, the singers represent the mourner. Not the funeral officiant, not the stern warning from heaven, not the community, not humanity—the individual mourner learning to cope with loss. That is what I was missing, and it makes everything make sense.

Mourning is messy. There’s more to it than sorrow: shock, anger, helplessness, relief sometimes, restless confrontation with (or even defiance of) personal mortality, fragile moments of pleasant remembrance. Eventually, with luck, mourners accept the death and remember how to live their own lives, but the road to acceptance (or at least resignation) is full of potholes and U-turns and switchbacks.

I won’t go all the way through the movements again, not least because I suspect different conductors have different ideas of the exact emotional content of each one. I do believe this Requiem is about the journey through loss, however, and I’ll remember that should I ever have the chance to sing it again.

Reading habits

Eye-opening… I’ve been fielding emailed congratulations on my new job as best I can amidst taxes (ugh) and move-planning and whatnot. (I still owe a lot of email. Will try to get through it all tonight.)

Messages trickled joyously in—until Meredith posted her kind congratulations, at which point I got a veritable flood.

This, of course, suggests rather strongly that quite a few folks who know me read Meredith and not me! Which is a perfectly logical and reasonable decision by the reading public, mind you. I’m just amused at what email inbox patterns suddenly revealed.

29 Ianuarii 2007

So, the real scoop

I wasn’t intending to leave Mason so soon. I really wasn’t. I love my job here.

But then I saw the job announcement from UW and froze in amazement. My job. That was my job, the job I essentially already do, the job I am committed to, just back in my beloved Frozen North. Argh! Why couldn’t it have opened when I was looking the first time?!

So I waffled and I wavered and I whimpered and I talked to my husband and I waffled some more… and I finally concluded that I’d hate myself if I didn’t at least try for it. And if I didn’t land it, no harm done; I’d just return quietly to the job I have and love and not worry about it.

Going to the interview felt like returning home. I can’t explain it any better than that. Much has changed in Madison just in the months since I left, but I can’t deny the “this is where I belong” relief as I strolled down State Street toward dinner at Wasabi, or hiked up the front stairs at Memorial in my interview garb.

Said garb was red, of course. Little-known academic-library interview tip: try to wear school colors, if you can do so without looking horrible or being too blatant about it. Gentlemen, this is what ties were invented for. Ladies, don’t go crazy, especially if the school colors include bright orange, but really, it can’t hurt. I wore green to Mason and red to UW. Clearly I got something right!

It’s a lateral move, career-wise-speaking. Sure, the new job has a bigger potential constituency and a much larger digital-library infrastructure around it, but in essentials, I’m still running a repository, with the same challenges and rewards as any other one. Even after cost-of-living differential, I’m taking a small but manageable pay cut (I did try to negotiate, but nothing doing), but given the quality-of-life differential, it’s worth it.

(DC area? Very not for me. That’s not its fault necessarily, but it’s so nonetheless. I also think that getting David back within walking distance of his dissertation advisor is a Very Good Thing. I’ve basically given up trying to goose him into finishing that damn thing; somebody else gets to goose him henceforth.)

I have a new snailmail address in Madison (email me if you want it for some reason), but no other contact information yet. My regular outside-of-work email address is not changing, so no worries there, and as always I hang around IM entirely too much at home. And, um, if any Madison-based readers might be able to pick up one librarian, one linguist, and a couple of extremely unhappy Goth-kitties at the airport sometime in mid-March, you’ll let me know, won’t you?

This does indeed mean that Mason will shortly post an opening for a repository manager. I heartily encourage any librarian interested in this area to apply, and will be happy to talk to potential applicants about Mason or Mason Libraries or MARS out-of-band. I’m plugging away at work on a get-up-to-speed manual for the new person, so asking me good questions will make that a more helpful resource.

Understand me well: this is a move-toward, not a move-away-from. Mason has treated me very well indeed, and I feel more than a little morose about leaving them so soon. I recommend the Libraries as an employer without hesitation, and I earnestly hope that they end up with a better repository manager than I’ve been.

Moving

Everyone who needs to know first now knows, so I can officially announce:

I will be joining the University of Wisconsin System as Digital Repository Librarian effective March 19, 2007. My duties will revolve around Minds@UW, the state university system’s institutional repository.

Although Minds@UW and I serve the entire system, I will be based in Madison at Memorial Library. I look forward to (re)joining the good people at Wisconsin!

28 Ianuarii 2007

AAP/PSP Response, Translated

Peter Suber posts the AAP/PSP’s response to the Nature exposé. As a free service, I’ll happily translate it for you:

Not-for-profit and commercial publishers, as a group, have a responsibility to make the case on important issues regarding science and research.

Every one of our members just better fall in line to defend us over this, you hear? If not, we’re one hundred percent screwed.

It’s unfortunate that reporters picked up on some early proposals that were not adopted and, regrettably, the Nature article has misrepresented what’s really going on.

We really, really regret having been caught red-handed. We can’t possibly explain, because there isn’t actually an honest explanation that doesn’t make us look worse, and we can’t be dishonest because we haven’t yet found the leak we’ve obviously got, and so we’ll just get caught again.

We and many others have legitimate concerns that government mandated open access could have unintended consequences for the scientific community – and anyone who relies on sound science.

We’re legitimately concerned that our profit margins will evaporate.

Scientists rely on a publishing system that delivers quality, technology, global dissemination and preservation of the record of science. We believe that government mandated open access could put essential aspects of the system at risk and could undermine the quality, sustainability or independence of science.

Remember those profit margins? They’re toast. Toast, I tell you!

And hey, we may be stumblebum slimepuppies, but we’ve been stumblebum slimepuppies for decades! Nay, centuries! You can’t want to get rid of us now! After all, nobody protected the pennyfarthing bicycle, and just look what happened! You don’t want that on your conscience, do you?

That’s why the AAP/PSP thinks it’s important that all sides of the debate are heard.

That’s why we’re spending enough money on Capitol Hill lobbyists and sleazy PR flacks to pay for the launch of several hundred open-access journals. That’s why we’ll tell every single whopper about open access we can think of, and pay buckus maximus to the biggest scuzzball we can find to come up with more whoppers for us when we run out.

Gotta protect those profit margins. At all cost.

No need to thank me. No, really. All just part of the service here at CavLec.

Place in the world

One thing that a focused conference like Open Repositories presents to the halfway-savvy people-watcher is a sense of social hierarchy (or network, if you’re not a hierarchical thinker about matters social) in the field.

I know roughly where I am in DSpace-space. I am assuredly not inner-circle, nor am I one of those social nexuses that pulls everybody together from all over. If DSpace-space were Orwell’s Party, I’d be in the Outer Party, a minor functionary just dangerous enough for the Inner Party to be watching. Plenty of people know my name, as it happens; they just don’t consider it an important name.

This is good. This is a position I can live with.

The Manakin developers were the unquestioned, lionized heroes of the DSpace user-group meeting. They deserved it. I am very much looking forward to getting my claws into Manakin. That’s not all, though. One of the frustrations of attending repo-rat meetings is seeing all sorts of people writing all sorts of lovely code around DSpace, code that I daren’t use because it mucks up the upgrade path.

I expect Manakin to change that. For the first time, I’ll be able to hand a DSpace design to someone else in one nice, neat package. (Can’t do that with DSpace’s JSPs, because some bits live in servlet code, and other bits live deep in the tag definitions, and just trust me, it gets ugly.) For the first time, a useful metadata-munge can be pulled out of one context and plopped into another without horsing around in core code. I fully anticipate that Manakin will mark a great flowering of shared code around DSpace. Exciting!

Repo-rats are an intensely pragmatic people. (Yes, me too. Strip away my Quixotesque idealist’s lance, and I’m a right peasant.) We love Tim Donohue because he solves our rubber-meets-road problems. Tim won the poster-session contest with a MediaFilter gizmo that automagically turns Microsoft Office docs (which are the bane of any repo-rat’s existence; we have lots of them, but we hate having them because they’re bad for preservation) into their corresponding Open Document formats. Tim was mobbed the entire poster reception, poor soul, and he arrived the next morning hoarse as a crow. Fortunately, he’d already given his Configurable Submission System talk—which just goes to show, doesn’t it? Practical, Tim is. We repo-rats like that.

For similar reasons, Eric Larson’s BibApp suite absolutely 0wnz0red the third day of the conference. I’m talking pwned, people, PWNED. It was a beautiful thing. BibApp is a little like DSpace Researcher Pages on steroids. Given a bunch of citations for a faculty author, BibApp can check them against SHERPA/ROMEO to see which are available for immediate repository ingest, at which point it happily packages those up for DSpace’s batch ingester. It can check indexes for a list of keywords indicating the author’s research interests. It can list the author’s favorite coauthors and publication venues. It’s gorgeous. Everybody wants it.

I went to library school with Eric, as it happens, and while I love his gizmo absolutely to pieces, what impressed me most is that Eric is a damn good speaker. I had no idea, more fool I. Pwning an entire day of OR ’07 is likely to lead to more opportunities for him, both for speaking and for code, and that is all to the good.

It just goes to show, sometimes the right work at the right time can put you on a moon-rocket. Fewer people knew Eric than knew me, before this conference. After it, I shall toddle on in my beloved semi-obscurity, nodding sagely as Eric gets used to well-deserved rock-stardom.

SIMILE

(Richard Rodgers)

What it’s about
- Problem is “heterogeneous metadata.” We’re stuck with DC, MODS, MARC, etc. etc. SIMILE recognizes that any single metadata scheme selected won’t be selected by the whole world, and won’t be what future collections use anyway. On the one hand, you can lose semantics; on the other, you can be too unique to create interoperable query and discovery systems (hello, MARC).
- RDF and other data technologies, aimed at semantic data interoperability. Can it solve the heterogeneous metadata problem?
- Data as graph, not table or tree. (RDF, not RDBMS or XML.)
- “appease the demo gods” — don’t bother, they always frown

Tools
- Tempting to just put a triple-store in DSpace; SIMILE project predicated on this not necessarily happening, built tools to cope with lots of problems.
- “RDFizer” tools convert metadata into RDF (MARC, EXIF, email headers, etc., new ones coming constantly)
- visualization tools
- “Gadget” (XML viewer based on XPath, sounds like Panorama Pro back in the day), “Welkin” (similar, for RDF)
- Browse tools (e.g. “Piggy Bank,” scrapes and RDFizes websites)
- Data comes in many forms; the idea is to create tools that “just work” with all of it.

Longwell
- faceted browser, webapp, N3 store underneath

metasearch tool demo
- interface very cluttered to my mind; salient information does not jump out
- interesting as a metasearch tool, but needs serious usability attention
- promising, but impractical as it stands; I would never give this to an undergraduate, and I’d be hesitant even to show it to faculty. It’s just not clear enough what you’re looking at.

“Timeline” tool demo
- events plotted by timeline
- embedded descriptions and linkouts
- pretty nice!

DWell
- facet-limiting for DSpace searches
- UI too cluttered, again, but I like the occurrence tallies
- good for smoking out metadata errors
- combo box for choosing columns MUCH too full

Data flow
- can RDFize content, or have DSpace expose RDF via OAI
- can dump the RDF in any old way from any old platform

Q: what about data provenance? does the tool show it?
A: you can make it do that.

(I was left rather cold by this presentation, and I hope I can constructively express why. Fantastic intelligence has gone into this work, no question—but nothing useful in adminspace or userspace has come out, is the problem. Arguably, the SIMILE project’s PiggyBank was Zotero before Zotero was Zotero, but the SIMILE project couldn’t or didn’t make the additional leap to “here’s a real problem that real people have that this work can solve.” And the remaining SIMILE projects follow the same pattern. Neat ideas, but horrendous UI when they’ve any UI at all, and no thought for how they fit into the world. Research programming par excellence.)

Manakin and geospatial metadata

(Adam Mikeal)

Two problems tackled: visualizing geospatial metadata, handling complex items

Background
- Repos have complex items (e.g. book scans)
- Metadata needs to move beyond Dublin Core
- Traditional approaches fall short

Collection
- Geologic Atlas of the US
- 227 folios, maps, text, photos
- 1894-1945, from the USGS
- economic geology and geography

Folios
- 10-40 pages per folio
- scanned at 300 dpi
- @100MB per scan
- a half-terabyte in total
- only complete scan of this series in existence

DSpace organization
- 1 collection, 227 items (one per folio)
- each item had multiple bitstreams (one per page)
- Extra bitstream, stitched-together, reduced-res PDF for screen viewing
- (this sounds familiar — it’s what I do with book scans!)

Geospatial metadata
- DCMI Dublin Core recommendation for longitude and latitude
- coverage.point, coverage.box

Problems with default DSpace
- interface optimized for items with only a few bitstreams (yes yes yes!)
- extremely cumbersome, long list, very large files, no info = user frustration
- can’t do anything with coverage metadata
- search results have zero context, no way to search across an area
- great collection, horrible UI that doesn’t leverage unique properties of the collection

Solution: Manakin

Item view
- root problem: complex item!
- exploring pages loses context (no sense of last/previous page, what folio it’s from, which item it’s from)
- gallery-style thumbnails for browsing (woo-hoo!) allow seeing the folio at a glance
- whitebox-style previews for detail view of pages; retain context
- customized metadata to show lat/long
- two download options: low-res JPEG or big TIFF

Manakin’s role
- override item-view template with XSL
- easily extends to complex item structures
- built-in METS allows images to be associated, correlated with each other

Collection view
- wanted a map-based interface
- chose Yahoo Maps for prettiness
- geographic coordinates available for all items
- put a clickable map in the collection view itself!
- user can determine coverage area
- visual explanation of collection coverage

Manakin
- override one template!
- XSL writes JavaScript to generate map (reads coordinates, etc. — isn’t that slow?)
- customized collection within existing repository
- search results also plotted on map via XSL/Javascript
- made a nice search-box interface

Quick to implement!

Results: better user experience, improved access, used unique features of the collection

Q: any Lucene customization necessary for custom search?
A: no

Q: releasing any of the code?
A: yes, it’s been talked about; needs to be genericized first

Themes in Manakin

(Alexey Maslov)

Aspects (content generation) talked about yesterday. Today is themes.

Aspects deliver a DRI document to the theme. Theme converts DRI to a more usable format (viz. HTML), and styles the result.

Theme components
- Sitemap (config file, Cocoon XML file, handles i18n, references other components)
- XSL (converts DRI to HTML)
- CSS (styles, of course)

Sitemap file on screen (I can’t see it; type too small. Looks pretty simple, though.)

Theme creation
- Create a new directory and sitemap
- Configure the sitemap and give it all necessary components
- Install the theme and apply it to a set of DSpace pages

Create a template
- Find the Themes directory
- Create new directory and name it after new theme
- Copy template contents into new directory

Configure
- edit the <theme-path>
- default sitemap uses existing XSL and CSS

Install
- work with xmlui.xconf file in dspace config directory
- add an entry for your theme; set theme path and URL matching rules

(This looks dead easy so far. Color me impressed.)

This gets you as far as an XHTML page.

Basic Theme Development
- work with XHTML, CSS, XSL
- CSS works like standard web development project; build stylesheet and reference it
- make sure sitemap references CSS correctly

Complex Theme Development
- XSL to get from DRI to XHTML
- easiest to override existing templates where necessary
- can use XSL imports as needed; acts like the “local” folder in JSPs

(XSLT intro, nothing new there, but he’s explaining it well)

- can build custom metadata handlers as well as messing around with page structure (neat!)

More advanced topics
- Metadata handlers
- Non-HTML output formats (hey, this is a neat idea…)
- i18n (works much like JSPs currently do)
- Static page insertion (wow, you can do this?)
- Non-XSL transformations (SAX-based, like STX)