Digital Preservation Education for NC State Government Employees

This past week, the North Carolina Department of Cultural Resources released guidelines for state employees responsible for preservation of the state’s public record. I have included the press release below. Whether or not you are an employee of the State of North Carolina, if you are interested in learning about digital preservation, I encourage you to spend some time exploring the site.

The digital information of today is our heritage of tomorrow.” –Governor Bev Perdue

The North Carolina Department of Cultural Resources’ State Library and State Archives (Cultural Resources) are proud to announce a new website to guide local and state government employees responsible for the preservation of our state’s public record. The site,, has resources that can help North Carolina government employees – and those responsible for digital information in general – learn how to ensure that today’s digital information is saved so that it can become tomorrow’s heritage.
ICPSR Releases “Guidelines for Effective Data Management Plans”

The Inter-university Consortium for Political and Social Research (ICPSR) has released their Guidelines for Effective Data Management Plans.

On the web site ICPSR writes this about these guidelines:

Many federal funding agencies, including NIH and most recently NSF, are requiring that grant applications contain data management plans for projects involving data collection. To support researchers in meeting this requirement, ICPSR is providing guidance on creating such plans.

The guidelines include:

  • A List of Federal Agency Policies on Data Management and Sharing
  • Elements of a Data Management Plan
  • Data Management Plan Resources and Examples
  • Other Data Management Plan Examples
  • Depositing Data with ICPSR for Long-term Data Management
  • List of Links Related to Data Management and Data Sharing
  • A Guide to Preparing Data

The guidelines contain a lot of really great information on how to effectively manage data; the information in the ICPSR guidelines is not just relevant to Social Science data managers, but to all data managers.

The Humanities Take on Data Mining via Google Books

binary dataThe Humanities are “Going Google”, according to Marc Parry of The Chronicle, in a piece he wrote a few weeks ago.

The gist of the article is that some Humanities scholars are very interested in data mining the texts scanned in for the Google Books Project.

Why do they want to use Big Data mining techniques to scan through entire corpuses of novels from a particular period? “The data are important because scholars can use these macro trends to pinpoint evolutionary mutants like Sir Walter Scott”, one scholar noted.

Some critics rightfully ask, what will this tell us that we don’t already know?

Their answer is that computers won’t destroy interpretation. They’ll ground it in a new type of evidence.

Still, sitting in his darkened office, Mr. Moretti is humble enough to admit those “cellars of culture” could contain nothing but duller, blander, stupider examples of what we already know. He throws up his hands. “It’s an interesting moment of truth for me,” he says.

(I think this is a backhanded critique of “research” in general, so I had a good laugh when I read this paragraph.)

Other takeaways — Google Books was not built for data mining, it was built to create content to sell ads against. It was built with the intention that each book will be read, one at a time, not data mined. The interfaces aren’t there for this kind of mining, and the metadata is poor to say the least. (Then again, metadata is generally inadequate; this problem is so “known” I won’t provide a citation!)

What do you think are the moral, legal, and scholarly implications (if any) of Google turning over thousands of scanned books to a handful of scholarly institutions, such as Stanford, for data mining?

A Short History of Scientific Information Services

A Short History of Scientific Information ServicesIn the following videos, the producer traces the history of scientific communication from verbal/in-person, to letters, and then to printed journals. The producer describes the work of ISI and the company’s founder, Eugene Garfield. Journals grew from a handful to thousands. This led to classification and indexing in order to find relevant journal articles via print. In the early 1960s, ISI digitized this indexing and classification system in order to aid in finding the required material. Only a small portion of literature is actually important enough to be cited often, thus, citation indexing was born.

(For those of you who are unfamiliar with citation indexing, and may be wondering why it is important — among many reasons…the founders of Google applied citation indexing to web links to create PageRank. They were not the first to apply citation indexing to web links, but they were among the first to figure out an entire business model around it by mining and selling the user generated data.)

This is one video that has been split into three parts for ease of viewing online. I found them interesting to watch. The videos were made, as far as I can tell, in the early 1970s, and they are infomercials for ISI. I have embedded the three parts below. I found the first one to be more fun to watch then the latter two. Those, however, are interesting from a recent-information-services-history perspective.

Part 1

Part 2

Part 3

Thanks, L.S. for the links.

Gaiman’s “MirrorMask” Library Cleverness

Gaiman’s “MirrorMask” Library ClevernessThis past week I watched Neil Gaiman’sMirror Mask“. The book-as-film chronicles the dream of a teenager whose mother has become ill and is undergoing surgery. In the scene below, the teenager, Helena, goes to the library with her New Best Friend, Valentine, to find clues to a missing charm. They arrive via flying books (you can see the flying books in the first few seconds of the video). The books fly back to the library if the reader insults them. The two characters insulted some Very Large Books, both to Get Out of a Predicament and to be Taken to the Library to find clues to the missing charm.

I love this scene. The reference librarian is funny, the library has an interesting design, and the fact that the books are alive and that Helena and Valentine have to use nets to “catch” them is cute. I was also amused by the idea of books molting due to depression, because there weren’t chosen to be read. I also wished I could have a copy of the Really Useful Book for myself!

I thought Gaiman showed great creativity and fun in creating new ways to store, access, and retrieve written, “analog” information.

Do you have any information retrieval favorites from fiction and/or film?