CSCI B669 Scientific Data Management and Preservation
Spring 2013

 
Instructor: Prof Beth Plale http://www.cs.indiana.edu/~plale
School of Informatics and Computing
Indiana University Bloomington
Times/Venue: Wed 5:30 - 8:00 p.m., 919 E. 10th St. Informatics East 130
 
Syllabus: B669DataMgt syllabus

Course Description: Environmental sensors, sequencing instruments, social media, and the Internet all contribute to fundamental changes in the nature of scientific research, suggesting data-driven research as the 4th Paradigm of Science. Digital data produced through computation is not a commodity that is consumed in a single use, but is an important and invaluable intellectual asset that can be used repeatedly to fuel new ideas and insights. Managing research data for the long-term, and ensuring its continued access, has emerged as a major challenge. But as the well known 2003 "Atkins report" states, "absent systematic archiving and curation of intermediate research results, data gathered at great expense will be lost". In this course we examine the full lifecycle of digital data with a focus on the challenges of Big Data.

The course covers the following topics:

  • Motivating applications in science
  • Data, metadata, and semantics
  • Big Data: Analytics
  • Big Data: Data Management
  • Data Preservation

The course utilizes lectures, presentations, and discussions. If student interest and background merits, students will get hands-on experience with research tools and web services around a class project. See http://pti.iu.edu for kinds of research tools to be explored.

Prerequisite Moderate level of mastery with programming in traditional programming language such as Java or C++, and this experience in something more substantial than toy standalone codes. Interdisciplinary teams that utilize complementary skill sets are a possibility depending on class makeup.