28 Septembris 2005

Comments: NARA-RLG repository audit checklist

I finally got around to writing up and shipping off my comments on NARA and RLG’s audit checklist for trusted digital repositories. I see no particular reason not to share them more widely; I hope other commenters do the same, because I expect to learn from their comments.

I have elided references to MPOW; otherwise, verbatim.

I am impressed with the thought and consideration that produced the Audit Checklist for the Certification of Trusted Digital Repositories, as well as the clarity with which it is written. It is already informing my policy- and procedure-development work, and I am extremely grateful to have it.

My comments on the August 2005 draft follow. I am open to further discussion, and would welcome additional opportunity to affect the document’s development.

General comments:

  • Some listed technical requirements, particularly in sections B and D, seem to pertain more to repository-software platforms than to individual instantiations of those platforms. Would it be acceptable for an individual repository to point to its platform software’s documentation as evidence of adherence to these guidelines during an audit? If not, audits will entail significant duplication of documentation effort by individual repositories running identical software.

    On a similar theme, might NARA-RLG consider a requirements checklist aimed specifically at repository-software developers? And what remediation will be required of individual repositories running out-of-the-box or experimental software? Is filing a feature request sufficient, or must the repository hire developers to fix deficiencies?

  • I would like to see this document address usability and accessibility of the digital-repository user interface. In my experience, attention is only paid to accessibility for (e.g.) the print-disabled when documents such as the Audit Checklist require it. Also, it is important to note that the user interface that end-users of content see is only one relevant interface. When submission to a digital repository is performed by the content creator or owner via a software interface (as it often is), that interface absolutely should not bar content creators and owners with disabilities from submitting content.

Section B2.7:

  • I would like to see enumeration or discussion of non-straightforward cases where the integrity of a collection could be called into question. What exactly is at issue, besides discrepancies between a collection description and the actual collection? Also, what internal or external documentation is adequate to bolster claims made in a collection description? What language should be used or avoided in collection descriptions to prevent discrepancies?

Section C (and Appendix 1):

  • I am deeply concerned about this entire section. Any repository that serves multiple constituencies will find satisfying this section’s requirements an administrative nightmare with dubious (if any) return in improved service or preservation potential.

    What repositories serve multiple constituencies? Mine, certainly. The [repository I manage] is intended to serve the entire university: all of its departments, all of its research units, all of its professional and teaching faculty. Nor is [the repository] unusual in that respect. I might also point out the Washington Research Library Consortium, whose soon-to-launch repository will serve half a dozen entire institutions. Entire states (entire countries!) are starting repositories to serve wide-ranging constituencies. One-constituency repositories are few and far between, and given the current environment (e.g. demonstrated publisher hostility to subject-specific repositories) are likely to remain so.

    I find myself in a cleft stick. If I write a Designated Community statement that covers every possible community that a university department or research unit might target, the statement’s obligatory vagueness and breadth will destroy any conceivable use of the statement as a diagnostic tool. If I myself try to write separate Designated Community statements for every community that uses [the repository], I will undoubtedly write some of them poorly or incorrectly.

    Moreover, I will have shouldered a major administrative burden, considering that Designated Communities will certainly change over time, especially as the Audit Checklist currently defines them. For example, hardware and software requirements for content access form part of Designated Community statements. Must I revisit statements on every release of relevant software?

    Finally, if I ask potential communities to write and maintain a Designated Community statement as a condition of submitting work to [the repository], I erect a barrier to attracting submitters. Current library literature says loudly and clearly that attracting submitters and content is the most difficult aspect of running a successful repository. Barriers are bad, and I find this one especially troublesome because submitters are unlikely to acknowledge or even understand the necessity.

    And I have not even touched on the administrative burden imposed by C4.1, in which I am supposed to test every single agreed-upon Designated Community for apparently unlimited aspects of content understandability! If none of my other comments receive attention, I must ask that this section be rewritten to enumerate the aspects of “understandability” that must be tested. I see metadata and hardware/software requirements mentioned in this section; if that is all, the section should say as much. If the repository’s user-interface usability is also implicated, this section should make that clear. The fuzzy statements about understandability in Appendix 1 simply do not suffice.

  • Reading through the intended uses of the Designated Community statement, I find that it is a yardstick for metadata quantity and quality (C2.1), content-delivery constraints (C3.1), hardware/software requirements for content consumption (C1.3, C3.1, C4), and access restriction (C3.3).

    Of this list, only access restriction seems easily and productively definable in terms of designated communities. I fully expect that my constituencies will consult me if they need to restrict access, and I fully expect to document their requirements, comply with them, and be able to demonstrate that I have done so.

    I would prefer that metadata quantity and quality be defined in terms of adherence to published metadata standards and best practices. I understand perfectly why I should have to make clear that [the repository] uses Dublin Core metadata; published standards such as OAI-PMH mandate Dublin Core use. I do not understand what [the repository]’s use of Dublin Core has to do with any or all university-specific constituencies or user communities.

    I would prefer that content-delivery constraints be uncoupled from designated communities entirely. When such constraints exist, they should be properly documented; that should suffice.

    I would prefer that hardware and software requirements for content consumption be defined primarily in terms of content format, not content audience. (The sole exception I can imagine would be blanket protection for classes of users such as the print-disabled. Even then, such protection cannot be absolute, or repositories would not be able to accept scanned images of written or typeset pages.) Documenting the software and hardware requirements for accessing content in an uncommon format is eminently reasonable, as is documenting migration or transformation plans for uncommon or proprietary formats.

    Defining this necessary documentation in terms of user communities senselessly complicates otherwise straightforward issues. If 90% of my constituencies only submit PDF and HTML preprints, for which preservation and access issues are well-understood, should I not focus my documentation efforts on the remaining 10% and their unique needs, rather than impose a bureaucratic burden that 90% of my constituencies will find pointless?

I salute RLG and NARA for producing the Audit Checklist, and I hope these comments on the August 2005 draft prove helpful.