Since starting on my Ph.D. I have been working with
Dr. Beth Plale and since Spring of
2004 I have been working a a research assistant in Dr. Plale's
DDE lab. The Projects
section below contains details of the research projects I have been working on. The Publication and
Presentation sections provide information on peer-reviewed publications and also other presentations
I have done regarding my research. Additional projects are also discussed on the
Classes page.
My research focus is on metadata management (with a particular focus on scientific grid environments), data grids,
services and SOA, XML, XML-Relational storage, and RDF. My disertation work is focussed on identifying the
characteristics of XML-based metadata and differences from general XML storage that can be exploited to provide
faster query response for grid portals while using a flexible, scalable, and adaptable generic relational data
structure that can be applied to varied scientific domains using different metadata schemas and data hierarchies.
XML Metadata Concept Catalog (XMC Cat)
The goal of my current research is to identify characteristics of scientific metadata schemas and how
those characteristics can be exploited in cataloging metadata to provide end-users the ability to
easily compose and execute complex queries over domain metadata (without needing to learn SQL, XPath
or XQuery). This would increase the ability of scientists to discover and reuse data when contrasted with
the keyword search capabilities currently used in many scientific portals. However, a
second conflicting goal is to have a loose coupling between the domain-specific XML metadata schema
and the database schema used in the metadata catalog. This loose coupling is needed for a metadata
catalog framework to be deployable in a diversity of scientific domains through configuration instead
of code customization.
As a first step towards this goal of a configurable metadata catalog based on domain metadata schemas,
in the Spring of 2008 I rewrote the myLEAD metadata catalog using
Axis2 which allows
it to be a lighter weight service than our previous software stack and allows greater
flexibility in configuring the web service used for the metadata catalog.
This on-going effort is the XML Metadata Concept Catalog.
As the volume of scientific data increases, a number of researchers have noted the need to capture metadata
automatically. This automated metadata capture needs to be done based on the metadata schema of the
domain in which a metadata catalog is deployed. In XMC Cat this is addressed by allowing plugins to be
registered which will do additional domain-specific harvesting of metadata from files being added to
the metadata catalog. This additional harvesting can be done asynchronously to prevent a performance cost
in adding files to the metadata catalog.
Linked Environments for Atmospheric Discovery (LEAD)
LEAD is a multi-institution Large ITR research project that brings together computer scientists,
meteteorological researchers, and meteorology educators in a collaborative effort. Through the LEAD
portal, researchers can search for data, compose complex forecasting workflows, and review their experiments.
My research in the LEAD project has focussed on the myLEAD metadata catalog that allows meteorological researchers
to store metadata regarding data, ongoing experiments and research results and easily create complex
queries over their workspace. A hybrid XML-Relational approach is used to store the metadata that is communicated
using the LEAD Metadata Schema which is a profile for the FDGC schema for spatial data.
The first Alpha
release of myLEAD was in May of 2005, followed by version 1.2 in the Spring of 2006 and
version 1.3 in August of 2007.
Relational Grid Resources (RGR)
In this project we developed a synthetic workload based on the GLUE schema for measuring the performance
of different server patforms (relational, XML, and LDAP) for storing metadata about resources in a grid
environment.