4 Augusti 2005

Fixing DSpace’s grotty markup

This is by way of being notes to self, but I thought it might serve other purposes as well: helping other people trying to do the same thing, and tempting the DSpace developers into fixing some of this themselves in the next release!

(Yeah, because I go to all the trouble of fixing the markup in our 1.2 staging-server install, and they go and release 1.3 today. I’m shrugging as philosophically as I can and writing it up to “plan to throw one away…”)

John Cowan’s at Extreme Markup 2005 along with all the other cool people, but if he were standing nearby, I’d beg him to make a JSP plugin for TagSoup, such that I could just run this garbage through it. I’m pretty sure one can’t just Tidy or TagSoup a JSP file, more’s the pity.

I will do them the courtesy of remarking that at least they never use font tags, and they’re very consistent about alt attributes. This is definitely good.

Mark me well, however: I don’t give the ghost of a hoot about Netscape 4. If it breaks, it breaks. Welcome to 2005. If you care about Netscape 4, adopt my recommendations at your own risk, because I know nothing of Netscape 4 hackery.

Things to fix if you want minimally acceptable HTML 4 markup out of DSpace (yes, this does indeed mean I do not consider out-of-the-box DSpace markup minimally acceptable, and if that makes me an insufferable markup snob, so be it):

  • Quote attribute values. I fix this with regular expressions, searching on (for example) width=([^ >"]+) and replacing with width="1". You will need to do this for width, height, align (will also catch valign), cellpadding, cellspacing, colspan, type, method, name, size, border and probably some others I’ll find and add to this post later.
  • Lowercase element and attribute names. Fix PRE, P, HTML (don’t bother searching; there’s only one, in layout/header-default.jsp), the various heading levels, A HREF, SCRIPT LANGUAGE. The regex is something like <(/?)P, replace with <1p, to catch end-tags in the same search.
  • Kill nowrap attributes. With extreme prejudice. Likewise non-breaking spaces, and paragraphs containing nothing but non-breaking spaces.
  • Add type="text/css" to the link declaration for the stylesheet in header-default.jsp.
  • Hunt for <a name>. Replace with id attributes, as the Good Markup $DEITY intended. Regex search for ><a name="([^"]+)”></a> and replace with id="1"> with a space before it (do NOT forget the space!). There will be a few left over, so kill them by hand. (There’s also something in edit-metadata.jsp that inserts them, but I’m not ambitious enough to patch that.)

Things to fix if you want minimally acceptable CSS out of DSpace:

  • Font sizes for the web should never, ever, ever be set in points. I recommend changing points to pixels and working from there; it’s what I’m going to do to start with. (I expect to end up at my usual set-body-to-100% and set-regular-text-to-.8em place, because it tends to work, but we’ll see how it goes.)
  • This is a debatable point, but I got rid of the hacky JSP browser-sniffing and renamed the result to .css instead of .css.jsp. If I’m going to do browser hacks, I want them limited to the CSS itself as much as possible. Plus, this makes testing easier; I don’t have to rebuild DSpace just for a single CSS tweak. (Yes, I know about file overwriting on rebuild, thanks.)

Things to fix if you want decent markup out of DSpace:

  • Dump tables right and left. They’re bloody everywhere, and many, many of them are thoroughly unnecessary. Replace with structural div tags as needed.
  • Rip out presentational attributes such as bgcolor, border, and suchlike. The color ones are especially egregious, because they hamper your ability to customize your install’s look. (MPOW doesn’t like colored boxes even a little bit, for example, preferring the clean white look. Hard to do with bogocolored tables all over the place.) The more tables you get rid of, the less of this you have to do; most of this particular grot is confined to table markup.
  • Fix navbar-default.jsp and navbar-admin.jsp so that the lists are actually lists instead of (wait for it) another table. Get rid of the arrows, too; that’s a dead visual giveaway for an out-of-the-box install. What I’m doing is using the same snippet of Java code they’re using to set a class name for CSS adornment instead of setting the image.

Things to fix for usability:

  • I’m sure most people fix this immediately, but in case you haven’t: The logo link in the upper left-hand corner of the header goes to the home URL of your DSpace install. It shouldn’t, if you’ve put your institution’s logo in there (as most people will); that logo should always go to the institution’s home page. Make the name of your install into a link to the install’s home page, instead. You’ll want to slap an id on there, so that you can tell the CSS not to do weird things when link-hovering.
  • A lot of places tell you to “contact the admin” or “contact us” and then spit out contact info. This is silly. There’s a perfectly good feedback page in DSpace, so swipe the code from footer-default.jsp: <a href="<%= request.getContextPath() %>/feedback?fromPage=<%= fromPage %>">Contact the DSpace administrator.</a> and run with it.

Things to fix for accessibility:

  • Move the navigation bar out of header-default.jsp and into footer-default.jsp (after you close off the main page content). Position with CSS. Danger Will Robinson: this only works if you dump DSpace’s main table layout altogether. Which you should anyway, but some folks won’t want to.
  • Add label elements to labels in forms. Make sure that the form element being labeled has an id attribute, and set the for attribute on its label to the same value. (Did I lose you? Sorry, my fault. If you have <input type="text" id="myinput"> (simplified, of course), you want its label to be tagged <label for="myinput">. Side benefit is that clicking on the label sets the focus to the input in most browsers.)

This is nowhere near enough, especially on the usability front, but it’s a start.