The Holiday Season Has Ended…

happy-new-year-2010The holiday season has ended, and after a nice long mental break I will again endeavor to make sense of all of these articles about data, data management, and data preservation.

I hope all of you had a peaceful holiday season and/or winter break.

White House RFI, US Taxpayer Access to Federally Funded Research

public access to taxpayer funded researchSPARC sent out a reminder email today that last week the White House sent out a Request for Information (RFI) regarding public access to taxpayer funded research. Specifically, the White House is “inviting input on ‘enhancing public access to archived publications resulting from research funded by federal science and technology agencies.'”

The RFI will only be available for 30 days, from 10 December 2009 to 7 January 2010. You may comment online through the Public Access Policy Blog. The comments will be centered on three themes spread across the 30 days. Implementation will be the theme from 10-20 December 2009; Features and Technology from 21-31 December 2009; and, Management from 1-7 January 2010.

For further information, please go to the blog for the White House Office of Science and Technology Policy (OSTP) or to SPARC’s Open Access site, the Alliance for Taxpayer Access.

The Fourth Paradigm Data-Intensive Scientific Discovery

The Fourth Paradigm Data-Intensive Scientific DiscoveryJohn Markoff, the author of a New York Times article called, “A Deluge of Data Shapes a New Era in Computing“, writes that Tony Hey, Stewart Tansley and Kristin Tolle have edited a book that discusses the “Fourth Paradigm”. The book, The Fourth Paradigm Data-Intensive Scientific Discovery, is in honor of Jim Gray, who argued that “computing was fundamentally transforming the practice of science”. Gray called it, “The Fourth Paradigm”, with “the first three paradigms as experimental, theoretical and, more recently, computational science”. Gray was lost at sea off the California coast in 2007. The book is a tribute by his colleagues’ to Gray’s perspective, as outlined below.

Continue reading “The Fourth Paradigm Data-Intensive Scientific Discovery”

Data Scientist vs. Data Manager

long-lived-data-collectionsSo, what exactly is a data scientist? How does this role compare to a data manager?

The authors of a 2005 National Science Foundation (NSF) report defined five actors in data management: data users, authors, managers, scientists and funding agencies. Today, I will examine the data scientist vs. the data manager.

First, what are the shared goals of the five actors in data management?

  • ensure that all legal obligations and community expectations for protecting privacy, security, and intellectual property are fully met;
  • participate in the development of community standards for data collection, deposition, use, maintenance, and migration;
  • work towards interoperability between communities and encourage cross- disciplinary data integration;
  • ensure that community decisions about data collections take into account the needs of users outside the community;
  • encourage free and open access wherever feasible; and
  • provide incentives, rewards, and recognition for scientists who share and archive data (NSF, 2005).

In order to fulfill these goals, an organization will need one or more individuals who can fulfill the role of data scientist and data manager. I say, “one or more”, simply because I believe that at one time or another, a researcher may find him- or herself acting as the sole data user, author, manager, and scientist.

Continue reading “Data Scientist vs. Data Manager”

What is Taming (the) Data (Deluge)?

Binary Code -- Oxford Releases Data Management Web SiteThe data deluge refers to the increasingly large and complex data sets generated by researchers that must be managed by their creators with “industrial-scale data centres and cutting-edge networking technology” (Nature 455) in order to provide for use and re-use of the data.

The lack of standards and infrastructure to appropriately manage this (often tax-payer funded) data requires data creators, data scientists, data managers, and data librarians to collaborate in order to create and acquire the technology required to provide for data use and re-use.

This blog is my way of sorting through the technology, management, research and development that have come together to successfully solve the data deluge. I will post and discuss both current and past R&D in this area. I welcome any comments.

Do you have any additional definitions of data deluge?

I’d like to thank the folks at the DICE Center for the inspiration for the title of this blog (p. 8).