DSpace for managing digitized collections
(Lisa Spiro)
They felt like freaks at first, because DSpace was supposedly designed for preprints etc. How many using DSpace for digitized collections? (Maybe 1/4 of room.) And for born-digital? (Most of room.)
Why DSpace?
- active OSS community
- support for digital preservation
- OAI
- single platform for all digital assets, both born-digital and digitized
- scalability
- leverage efforts to integrate Sakai (teaching assets) with DSpace; also Connexions (course materials)
Said that they had trouble with a commercial solution — couldn’t customize it to their needs.
Runthrough of various Rice projects; includes special-collections, digitized TDs, digitized early print journal. PDFs, TEI, audio, images.
Obstacles to using DSpace for digitization projects
- DSpace designed for born-digital, simple items (like PDFs)
- No support for hierarchical collections
- Only supports as-is content presentation (can’t transform TEI or ETD to HTML for presentation; can’t handle features of JPEG 2000 or streaming media)
- Needed customizable UI (like Manakin)
- Hard to integrate with other repositories and services (as part of a larger collection, or for mashups)
Getting digitized content into DSpace
- Create digital objects
- Create metadata (Excel spreadsheet exported from legacy CMS, created by vendor)
- Convert metadata and file structure into DSpace batch format (convert dates, rework metadata, match metadata to objects, separate out thumbnails, generate reports)
- Set up root collection
- Use batch importer
- Enhancements to import command: file descriptions, primary bitstreams, creation of communities and collection, testing for validity of metadata fields, (future) specify access permissions
- Useful to have web-services ingestion in future
Hierarchical content in DSpace
- Can represent it in communities/collections or in metadata
- DSpace’s hierarchies based on organizational units, not content units!
Linking to related materials
- need to link to related materials, such as books, related teaching modules, GIS maps
- now: DC relation field
- prettier in Manakin!
Future: METS viewer?
- can already store METS data in DSpace
- DSpace doesn’t *do* anything with METS
- METS allows more nimble presentation of complex digital objects (e.g. photo albums, musical performances with multiple movements, books to images within the books; cf RLG METS viewer and “page-turner” application)
- Need easier way to create METS data (inter-item refs are harder, because you don’t have the handle yet! chicken-and-egg problem)
- Work underway (e.g. China Digital Museum project)
XML support
- XML is “supported” format, but you can’t just display raw XML to users
- Need to transform TEI Light to XHTML
- could store pre-gen XHTML, but it’s really just for presentation, not archival
- their solution: item contains XML and images, each item specifies an XSL stylesheet to transform to HTML
- DSpace stores transformed HTML in a bundle
- can change a stylesheet in the config to change the presentation
- problem: text + images = sloooooooooooow
- solution: make links use DSpace “retrieve” URLs (so you don’t have to go through the whole Java monolith), or change
links to serve thumbnail copies directly from Apache (OK since it’s just for presentation)
Future:
- streaming server (for content for which they only have streaming rights)
- zooming via JPEG 2000 (keep TIFF for archival)
Q: Will the new DSpace data model be helpful?
A: YES YES YES! Resolves lots of stupidity, such as 800 bitstreams showing up for the same item with no navigation between them. (Heck yes. I have this problem too.)