Research Interests & Publications

Since finishing my PhD at the end of the Summer (2010), I have been working as a post-doctoral researcher at the Data to Insight Center directed by Dr. Beth Plale. The projects at D2I in which I am involved focus on an interdisciplinary approach to preservation of scientific data through metadata and provenance.
Check out the exciting projects we are working on!

In August of 2010 I defended my dissertation, "An Adaptable Repository for Complex Scientific Metadata" which shows how the XML metadata schemata used to describe scientific data differ from general XML and how these differences can be exploited to capture, discover, and manage the metadata that describes scientific data.

Since shortly after starting my Ph.D., I have been working with as a research assistant in Dr. Plale's lab. The Projects section below contains details about some of the research projects I have been working on. The Publication and Presentation sections provide information on peer-reviewed publications and also other presentations I have done regarding my research.

Research Interests

My research focus is on metadata capture, management, reducing the misalignment of incentives for metadata capture, applying data across domains, data provenance, provenance of network measurements, data grids, services and SOA, XML, XML-Relational storage, and RDF. My dissertation work focused on identifying the characteristics of XML-based metadata and differences from general XML storage that can be exploited to provide faster query response in searching for e-Science data while using a flexible, scalable, and adaptable generic relational data structure that can be applied to varied scientific domains using different metadata schemas and data hierarchies.

XML Metadata Concept Catalog (XMC Cat)
My research has focused on identify characteristics of scientific metadata schemas and how those characteristics can be exploited in cataloging metadata to provide end-users the ability to easily compose and execute complex queries over domain metadata (without needing to learn SQL, XPath or XQuery). This would increase the ability of scientists to discover and reuse data when contrasted with the keyword search capabilities currently used in many scientific portals. However, a second conflicting goal is to have a loose coupling between the domain-specific XML metadata schema and the database schema used in the metadata catalog. This loose coupling is needed for a metadata catalog framework to be deployable in a diversity of scientific domains through configuration instead of code customization.

Additionally, in XMC Cat we are looking to reduce the incentive misalignment between those who can generate metadata and those who will benefit from it by capturing it during the scientific process to both increase the value of the metadata to the researcher generating the data as well as reducing the cost of capturing metadata through automation.

As a first step towards this goal of a configurable metadata catalog based on domain metadata schemas, in the Spring of 2008 I rewrote the myLEAD metadata catalog using Axis2 which allows it to be a lighter weight service than our previous software stack and allows greater flexibility in configuring the web service used for the metadata catalog. This on-going effort is the XML Metadata Concept Catalog.

As the volume of scientific data increases, a number of researchers have noted the need to capture metadata automatically. This automated metadata capture needs to be done based on the metadata schema of the domain in which a metadata catalog is deployed. In XMC Cat this is addressed by allowing plugins to be registered which will do additional domain-specific harvesting of metadata from files being added to the metadata catalog. This additional harvesting can be done asynchronously to prevent a performance cost in adding files to the metadata catalog.

Link to prior version of the XMC Cat web page

Linked Environments for Atmospheric Discovery (LEAD)
LEAD is a multi-institution Large ITR research project that brings together computer scientists, meteteorological researchers, and meteorology educators in a collaborative effort. Through the LEAD portal, researchers can search for data, compose complex forecasting workflows, and review their experiments.

My research in the LEAD project has focused on the myLEAD metadata catalog that allows meteorological researchers to store metadata regarding data, ongoing experiments and research results and easily create complex queries over their workspace. A hybrid XML-Relational approach is used to store the metadata that is communicated using the LEAD Metadata Schema which is a profile for the FDGC schema for spatial data.

The first Alpha release of myLEAD was in May of 2005, followed by version 1.2 in the Spring of 2006 and version 1.3 in August of 2007.

Relational Grid Resources (RGR)
In this project we developed a synthetic workload based on the GLUE schema for measuring the performance of different server patforms (relational, XML, and LDAP) for storing metadata about resources in a grid environment.

Technical Reports and Other Publications

Presentations

  • Scott Jensen, Michael Cox, David Bender, Miao Chen, Julie England, Beth Plale, and David Leake, Spatial Data in an Ontology for Research on Forest Resources, Presented at "Ontology of Spatial Thinking and Reasoning: Multidiciplinary Reconciliation, COSIT'11 Workshop, Belfast Maine, September 12, 2011, pp. 28-30.
  • Scott Jensen, Scientific Data Discovery with XMC Cat, presented in the Pervasive Technology Institute's mini-workshop, Pushing Back on the Data Deluge: Advancements in Metadata, Archival, and Workflows at SuperComputing 2010, New Orleans, November 16, 2010.
  • Scott Jensen, Beth Plale, XMC Cat: An Adaptive Catalog for Scientific Metadata, Improving Observing Network Coordination: A Cyberinformatics Forum, Boulder, Colorado, May 17-18, 2010.
  • Scott Jensen, Beth Plale, Taming Complex Scientific Metadata Schemas, Fifth Midwest Database Research Symposium, Chicago, October 4, 2008.
  • Scott Jensen Using Metadata to Find Relevant Data in the e-Science Haystack, Student Presentation at the Indiana University Center for Data and Search Informatics, September 17, 2008.
  • Scott Jensen, Beth Plale, Domain-Friendly Metadata Management, 3rd Annual TeraGrid Conference, Las Vegas, June 2008.
  • Scott Jensen, Beth Plale, Hybrid Approach to Complex Scientific Metadata in a Grid Environment, Third Midwest Database Research Symposium, Champaign-Urbana, April 15, 2006.
  • Scott Jensen, Beth Plale, Towards a Metadata Catalog Query Language, Systems Research Seminar, Indiana University November 3, 2005.
  • Scott Jensen, Sangmi Lee, Yiming Sun, Beth Plale, Extending Metadata Catalogs to Address Complex Queries in Context, LEAD Year-2 Site Visit Poster Session, Champaign-Urbana, July 2005.
  • Tharaka Devadithya, Scott Jensen, Thomas Reichherzer, Yiming Sun, Monitoring What We Eat Using Tangibles, 2005 Indiana University Making IT Happen Student Showcases. February 16, 2005.