‘Data curation’ Archive

4 Novembris 2008

Digital preservation policy how-to

Policy development tends to confuse me rather; I’ve learned how to do it, but it’s definitely been an acquired skill. People who can do that work easily and lucidly amaze me.

If your institution is looking at digital preservation feeling lost and unsure, I can’t recommend this report from JISC highly enough. It lays out what needs to appear in an institutional digital preservation policy, how to pitch it as part of the institutional mandate, and what examples are worth following. It’s just techie enough without being incomprehensibly geeky.

It’s excellent. Take a look. Seriously.

27 Octobris 2008

eResearch room on FriendFeed

If FriendFeed is your social-networking poison of choice, I’ve just opened a room for eResearch there. Quality links showing up already; join the fun!

21 Octobris 2008

Meaner than I am

You think I’m hard on repositories? Check out this guy. I can’t argue with a lot of what he has to say. Wish I could, frankly. (Via Mike Lynch.)

The basic problem with his mindset is “who bells the cat?” Sure, we can focus on objects rather than repositories; it makes a lot of sense to do so. Who’s going to change researcher culture such that information producers lift a finger to ensure the durability of their products? They’ve never done it for paper. Why are they going to do it now?

29 Septembris 2008

JISC report on data curation

Okay, okay, so I’m finally going to have to admit—reluctantly—that most data curators have domain expertise, and that that’s the most desirable situation for researchers.

However, the Swan and Brown report offers even comp-lit majors like me a little hope:

On the other side of the coin, there are data scientists who argue that it is not necessary to be a subject expert in order to do the job effectively. There are some fundamental data science skills that are generic in nature, such as dealing with confidential research, data description and metadata, software, copyright and intellectual property rights, and data storage. Although this is may be so, the core issue is that of effective communication between a data scientist and their research colleagues…

From a practical perspective, as demand for competent data scientists grows, so it will become necessary to cast the net as wide as possible. Subject knowledge is important, but so too are technical skills and people skills…

We must consider also the question of technical and computing aptitude…

Our online survey of current data scientists also showed that the data science community is evenly split on whether people skills are more important than technical skills for success as a data scientist – but then people’s opinions are often predicated on their own experiences and whether their own strengths lied toward the technical or people skills end of the spectrum. It is uncommon to find people who are excellent at both. We came across several examples of instances where people whose background was primarily computing and information technology became sufficiently familiar with the subject area of their specialist institutions that they were deemed to be effective data scientists.

In my mind, I compare this to the situation of librarians who become selectors or bibliographers in subject areas where they have no formal training. Let’s not kid ourselves, it happens, especially in the sciences. I’ve known some such—and paradoxically, they tended to be toward the more effective end of the scale. Admittedly, this is because they were people with courage sufficient to dive into an unfamiliar topic head-first, and such people tend to be naturally effective at whatever they turn their hand to. Sometimes, though, being buried in the subject is a positive disadvantage. Ever had a foreign-language teacher who was a native speaker, so embedded in the language that s/he couldn’t explain it? I have. I think this happens to scientists a lot.

I’m still reading the report, which is an evenhanded and intelligent one. I quibble with the idea that rigid terminology distinctions are appropriate at this early date, but I think the lines drawn in the report are useful ways to think about the problem as long as they’re not meant to reify it. I have most of the skills of the report’s “data librarian,” but some of the “data manager”’s skills as well. This is not a bad thing. It should be encouraged!