Research Interests & Publications

Since starting on my Ph.D. I have been working with Dr. Beth Plale and since Spring of 2004 I have been working a a research assistant in Dr. Plale's DDE lab. The Projects section below contains details of the research projects I have been working on. The Publication and Presentation sections provide information on peer-reviewed publications and also other presentations I have done regarding my research. Additional projects are also discussed on the Classes page.
Research Interests

My research focus is on metadata management (with a particular focus on scientific grid environments), data grids, services and SOA, XML, XML-Relational storage, and RDF. My disertation work is focussed on identifying the characteristics of XML-based metadata and differences from general XML storage that can be exploited to provide faster query response for grid portals while using a flexible, scalable, and adaptable generic relational data structure that can be applied to varied scientific domains using different metadata schemas and data hierarchies.

XML Metadata Concept Catalog (XMC Cat)
The goal of my current research is to identify characteristics of scientific metadata schemas and how those characteristics can be exploited in cataloging metadata to provide end-users the ability to easily compose and execute complex queries over domain metadata (without needing to learn SQL, XPath or XQuery). This would increase the ability of scientists to discover and reuse data when contrasted with the keyword search capabilities currently used in many scientific portals. However, a second conflicting goal is to have a loose coupling between the domain-specific XML metadata schema and the database schema used in the metadata catalog. This loose coupling is needed for a metadata catalog framework to be deployable in a diversity of scientific domains through configuration instead of code customization.

As a first step towards this goal of a configurable metadata catalog based on domain metadata schemas, in the Spring of 2008 I rewrote the myLEAD metadata catalog using Axis2 which allows it to be a lighter weight service than our previous software stack and allows greater flexibility in configuring the web service used for the metadata catalog. This on-going effort is the XML Metadata Concept Catalog.

As the volume of scientific data increases, a number of researchers have noted the need to capture metadata automatically. This automated metadata capture needs to be done based on the metadata schema of the domain in which a metadata catalog is deployed. In XMC Cat this is addressed by allowing plugins to be registered which will do additional domain-specific harvesting of metadata from files being added to the metadata catalog. This additional harvesting can be done asynchronously to prevent a performance cost in adding files to the metadata catalog.

Linked Environments for Atmospheric Discovery (LEAD)
LEAD is a multi-institution Large ITR research project that brings together computer scientists, meteteorological researchers, and meteorology educators in a collaborative effort. Through the LEAD portal, researchers can search for data, compose complex forecasting workflows, and review their experiments.

My research in the LEAD project has focussed on the myLEAD metadata catalog that allows meteorological researchers to store metadata regarding data, ongoing experiments and research results and easily create complex queries over their workspace. A hybrid XML-Relational approach is used to store the metadata that is communicated using the LEAD Metadata Schema which is a profile for the FDGC schema for spatial data.

The first Alpha release of myLEAD was in May of 2005, followed by version 1.2 in the Spring of 2006 and version 1.3 in August of 2007.

Relational Grid Resources (RGR)
In this project we developed a synthetic workload based on the GLUE schema for measuring the performance of different server patforms (relational, XML, and LDAP) for storing metadata about resources in a grid environment.

  • Scott Jensen and Beth Plale, Using Characteristics of Computational Science Schemas for Workflow Metadata Management, In Proceedings of the 2008 IEEE Congress on Services, IEEE 2008 Second International Workshop on Scientific Workflows (SWF 2008), Hawaii, July 2008.
  • Dennis Gannon, Beth Plale, Marcus Christie, Yi Huang, Scott Jensen, Ning Liu, Suresh Marru, Sangmi Lee Pallickara, Srinath Perera, Satoshi Shirasuna, Yogesh Simmhan, Aleksander Slominski, Yiming Sun, Nithya Vijayakumar, Building Grid Portals for e-Science: A Service Oriented Architecture To appear High Performance Computing and Grids in Action, IOS Press - Amsterdam, Lucio Grandinetti editor, 2007
  • Will Odom, Scott Jensen, Meng Li, Senior Travel Buddies: Sustainable Ride-Sharing & Socialization, In CHI '07 Extended Abstracts on Human Factors in Computing Systems (CHI '07), San Jose, May 2007.
  • Yiming Sun, Scott Jensen, Sangmi Lee Pallickara, and Beth Plale, Personal Workspace for Large-scale Data-driven Computational Experimentation, 7th IEEE/ACM International Conference on Grid Computing (Grid'06), Barcelona, September 2006.
  • Scott Jensen, Beth Plale, Sangmi Lee Pallickara and Yiming Sun, A Hybrid XML-Relational Grid Metadata Catalog, Workshop on Web Services-based Grid Applications (WGSA'06) in association with International Conference on Parallel Processing (ICPP-06), August 2006.
  • D. Gannon, B. Plale, M. Christie, L. Fang, Y. Huang, S. Jensen, G. Kandaswamy, S. Marru, S.L. Pallickara, S. Shirasuna, Y. Simmhan, A. Slominski, and Y. Sun, "Service-oriented Architectures for Science Gateways on Grid Systems", International Conference on Service-Oriented Computing (ICSOC), Lecture Notes in Computer Science 3826, B. Benatallah, F. Casati, and P. Traverso (Eds.), Springer-Verlag, Berlin Heidelberg, pp. 21-32, 2005.
  • Sangmi Lee Pallickara, Beth Plale, Scott Jensen, Yiming Sun, Short Paper: Monitoring Access to Stateful Resources in Grid Environments, IEEE International Conference on Services Computing, Orlando, Florida, July 2005.
  • Sangmi Lee Pallickara, Beth Plale, Scott Jensen, Yiming Sun, Structure, sharing, and preservation of scientific experiment data, IEEE 3rd International Workshop on Challenges of Large Applications in Distributed Environments (CLADE), July 2005.
  • Beth Plale, Craig Jacobs, Scott Jensen, Ying Liu, Charlie Moad, Rupali Parab, and Prajakta Vaidya, Understanding Grid Resource Information Management through a Synthetic Database Benchmark/Workload, 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2004), April 2004.
Presentations