I hope all of you had a peaceful holiday season and/or winter break.
SPARC sent out a reminder email today that last week the White House sent out a Request for Information (RFI) regarding public access to taxpayer funded research. Specifically, the White House is “inviting input on ‘enhancing public access to archived publications resulting from research funded by federal science and technology agencies.'”
The RFI will only be available for 30 days, from 10 December 2009 to 7 January 2010. You may comment online through the Public Access Policy Blog. The comments will be centered on three themes spread across the 30 days. Implementation will be the theme from 10-20 December 2009; Features and Technology from 21-31 December 2009; and, Management from 1-7 January 2010.
For further information, please go to the blog for the White House Office of Science and Technology Policy (OSTP) or to SPARC’s Open Access site, the Alliance for Taxpayer Access.
John Markoff, the author of a New York Times article called, “A Deluge of Data Shapes a New Era in Computing“, writes that Tony Hey, Stewart Tansley and Kristin Tolle have edited a book that discusses the “Fourth Paradigm”. The book, The Fourth Paradigm Data-Intensive Scientific Discovery, is in honor of Jim Gray, who argued that “computing was fundamentally transforming the practice of science”. Gray called it, “The Fourth Paradigm”, with “the first three paradigms as experimental, theoretical and, more recently, computational science”. Gray was lost at sea off the California coast in 2007. The book is a tribute by his colleagues’ to Gray’s perspective, as outlined below.
CBS News ran this nice overview of the digital preservation/data (bit) rot problem. I have found this to be a great way to explain the problem in a way that doesn’t put the average listener to sleep.
Go ahead, send the article and video out to your friends and family without fear. You know you want to….
The authors of a 2005 National Science Foundation (NSF) report defined five actors in data management: data users, authors, managers, scientists and funding agencies. Today, I will examine the data scientist vs. the data manager.
First, what are the shared goals of the five actors in data management?
- ensure that all legal obligations and community expectations for protecting privacy, security, and intellectual property are fully met;
- participate in the development of community standards for data collection, deposition, use, maintenance, and migration;
- work towards interoperability between communities and encourage cross- disciplinary data integration;
- ensure that community decisions about data collections take into account the needs of users outside the community;
- encourage free and open access wherever feasible; and
- provide incentives, rewards, and recognition for scientists who share and archive data (NSF, 2005).
In order to fulfill these goals, an organization will need one or more individuals who can fulfill the role of data scientist and data manager. I say, “one or more”, simply because I believe that at one time or another, a researcher may find him- or herself acting as the sole data user, author, manager, and scientist.
The data deluge refers to the increasingly large and complex data sets generated by researchers that must be managed by their creators with “industrial-scale data centres and cutting-edge networking technology” (Nature 455) in order to provide for use and re-use of the data.
The lack of standards and infrastructure to appropriately manage this (often tax-payer funded) data requires data creators, data scientists, data managers, and data librarians to collaborate in order to create and acquire the technology required to provide for data use and re-use.
This blog is my way of sorting through the technology, management, research and development that have come together to successfully solve the data deluge. I will post and discuss both current and past R&D in this area. I welcome any comments.
Do you have any additional definitions of data deluge?