Today I uploaded a fairly large website (book-length set of translations from the Italian) to DSpace. It did not (she said sheepishly) exactly go smoothly, partly because I was too dense to realize that the restrictions on website import explained in the DSpace system docs boil down in practice to “flatten out the entire website hierarchy before you import.” So now you know what I didn’t.
DSpace handles a multi-file website by asking the system $DEITY to set the site’s entrance-point, the so-called “primary bitstream.” This can be done through the edit-item UI, but if you have forty-gajillion files, that page is a heifer to load (especially on Firefox for OS X, and can anyone explain why that is?). If it’s too big (again, on Firefox for OS X), changes you make to it don’t take for some reason.
So skip it. Make the fix in the database instead. Here’s how, starting with an item handle of (for illustrative purposes) 0000/72:
- Figure out what number DSpace assigned to the file you want to be the primary bitstream. There is no easy way to do this; there are only more or less annoying ways. One way is to query the database for the filename of that bitstream:
select * from bitstream where name = 'myfile.html'. This won’t work if you have a bunch of files by that same name, obviously. Let’s say you found that it was 987. - Figure out what the database-internal ID for the item is:
select * from handle where handle='0000/72';. Note theresource_idin the resulting line; we’ll pretend that it was 123. - Find the item’s bundles:
select * from item2bundle where item_id = 123;You should get at least two lines back; note theirbundle_ids. One of these is the license bundle. If you uploaded any pictures, the thumbnails get a bundle. And one of them is the bundle you want, the one that holds your HTML and other files. - Run each bundle ID through the query
select * from bundle where bundle_id =[bundle_id];. (If your bundle IDs are in a nice sequence, which they generally are, you can just doselect * from bundle where bundle_id between[first_bundle_id]and[last_bundle_id];.) You’re looking for the one whose name is “ORIGINAL.” We’ll say it’s 222. - Set the primary bitstream:
update bundle set primary_bitstream_id = 987 where bundle_id = 222;.
And now surf back to your item’s page; all should be well. This blogpost brought to you by the Department of I Don’t Want To Forget This Next Time. And yes, there are probably ways to combine all this into one gonzo SQL statement, but my SQL-fu is not that strong, and it’s easier to explain if I string it all out anyhow.



