MyLEAD is a personal metadata catalog for storing and searching widely distributed, terascale, federated scientific data products. The metadata catalog service plugs into the LEAD service oriented architecture (SOA). It is built on top of the UK OGSA-DAI OGSI R6, Globus Toolkit 3.2.1, and uses mySQL 5.0.18. Larger data products are stored to a local file system managed by the Globus GT4.0 DRS.
Data products are stored and retrieved as XML documents defined by the LEAD Metadata Schema (LMS). The LMS is a profile of the Federal Geographic Data Committee (FGDC) metadata standard, and as such it extends and specializes the FGDC standard. XML documents are stored in the relational database by a hybrid technique that combines shredding and selective use of CLOBS (character blobs). The technique is described in "A Hybrid XML-Relational Grid Metadata Catalog," February 2006.
A major strength of the myLEAD metadata catalog is that it supports attribute extension. New attributes or annotations to a data product can be added on the fly to the database. Thus there is very little of the relational schema that is specific to the meteorology domain. The LMS has this same extensibility property, smoothing its adoption and adoption of the metadata catalog by other scientific or non-scientific domains.
The project's next steps are 1.) porting myLEAD to OGSA-DAI WSRF 2.1, 2.) deploying a hardened myLEAD in a distributed configuration on the LEAD grid, and 3.) performance optimizations to minimize the number of copies of a data product that must exist.
For help with this release, please contact Yiming Sun (yimsun -at- cs.indiana.edu). Additional questions and comments can be addressed to the project director, Professor Beth Plale (plale -at- cs.indiana.edu).
Release V1.3
Source Code
- myLEAD Server, release date August 14, 2007
- myLEAD Server source code
- myLEAD Binary
- myLEAD Database Installation Scripts
- myLEAD Stored Procedure Installation Files
- myLEAD Activity Schemas
- myLEAD Activity Map
- myLEAD Schema - LEAD Metadata Schema and myLEAD Schema (myLEAD Types)
- Additional Libraries
- Log4J Properties File
- myLEAD and Additional Client Libraries
- myLEAD Client Service(Agent service), release date July 26, 2007
- myLEAD Client service source code
- myLEAD Publisher Service, release date August 7, 2007
- myLEAD Publisher service source code
Documentation
- myLEAD Server Installation Guide
- myLEAD Developer Guide
- myLEAD Queryable Metadata
- myLEAD Query Response Schema
- myLEAD javadoc
- myLEAD Agent Installation and Developer's Guide
- myLEAD Publisher Installation and Developer's Guide
Release V1.2
Source Code
- myLEAD service, release date March 27, 2006
- Server Library
- Database Script
- XML Activity Schemas
- myLEAD Activity Map
- XML Type Schemas
- Stored Procedures
- log4j.properties File
- OGSA-DAI deliver-to-stream activity
- XPP Pull Parser Version 3-1.1.3.4.B
- myLEAD Client Service(Agent service), release date March 24, 2006
- myLEAD Client service source code[GT3 Compatible]: starter's kit and installation test are also included.
- myLEAD Client service source code[Web Service]: starter's kit and installation test are also included.
- Starter's Kit:jar files and sample code to access agent service
- installation test suite: test software for your installation of the mylead server and client service
- myLEAD Client service(agent) interface jar file
Documentation
- Installation guide: server and agent service
- Developers Guide: Developing with the myLEAD client toolkit
- Developers Guide: Accessing the myLEAD Client service
Java Docs
Release V0.3alpha
Source Code
- myLEAD service, released May 27, 2005
- Server Library
- Database Script
- Activity XML Schema
- Stored Procedures
- log4j.properties File
- myLEAD Agent Service, released June 12, 2005
- Agent service source code
- Starter's Kit:jar files and sample code to access agent service
- installation test suite
- agent interface jar file
Documentation
- Installation guide: server and agent service
- Developers Guide: Developing with the myLEAD client toolkit
- Developers Guide: Accessing the myLEAD agent service
Java Docs
Constraints of myLEAD v0.3alpha release
1.) XML schema used for adding a metadata record and retrieving results: the "LEAD metadata schema" is not supported. Currently supported schema is AHM05 schema. Will change when the LEAD metadata schema becomes available.
2.) myLEAD and Globus RLS: MyLEAD provides guarantees over the data products it 'manages'. It uses Globus RLS as the service interface to the repository where data prodcts themselves exist. The current approach to adding a new data product to myLEAD:
- Step 1: user registers location of data product with the local RLS Local Replica Catalog (LRC)
- Step 2: user invokes myLEAD agent to add metadata record.
- Step 3: upon receipt of a request to add a metadata record, myLEAD will check to ensure that the file is indeed recorded to RLS before committing the metadata record add.
This lighter-weight solution will give us some interoperability with RLS in release v0.3alpha while we work out the optimal tradeoff between autonomy of the two components and atomicity, and put in place a storage container.
3.) Replication, publishing, versioning, and sharing are not supported features in the v0.3alpha version of the release.
Select Publications
See DDE lab publications page for full list.
Beth Plale, Dennis Gannon, Jay Alameda, Bob Wilhelmson, Shawn Hampton, Al Rossi, and Kelvin Droegemeier Active Management of Scientific Data IEEE Internet Computing special issue on Internet Access to Scientific Data, Vol. 9, No. 1, Jan/Feb 2005, pp. 27-34.
Sangmi Lee Pallickara, Beth Plale, Scott Jensen, Yiming Sun, Structure, sharing, and preservation of scientific experiment data, IEEE 3rd International Workshop on Challenges of Large Applications in Distributed Environments (CLADE), July 2005.
Sangmi Lee Pallickara, Beth Plale, Scott Jensen, Yiming Sun, Short Paper: Monitoring Access to Stateful Resources in Grid Environments, IEEE International Conference on Services Computing, Orlando, Florida, July 2005.
Sangmi Lee Pallickara, Beth Plale, Liang Fang, and Dennis Gannon, Short Paper: Trust Cell: Towards the End-to-End Trust in Data-Oriented Scientific Computing, IEEE Cluster Computing and Grid (CCGrid), May 2006.
D. Gannon, B. Plale, M. Christie, L. Fang, Y. Huang, S. Jensen, G. Kandaswamy, S. Marru, S.L. Pallickara, S. Shirasuna, Y. Simmhan, A. Slominski, and Y. Sun, "Service-oriented Architectures for Science Gateways on Grid Systems", International Conference on Service-Oriented Computing (ICSOC) Lecture Notes in Computer Science 3826, B. Benatallah, F. Casati, and P. Traverso (Eds.), Springer-Verlag, Berlin Heidelberg, pp. 21-32, 2005.
Beth Plale, Rahul Ramachandran, and Steve Tanner, Data Management Support for Adaptive Analysis and Predicton of the Atmosphere in LEAD, 22nd Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology (IIPS), January 2006. (non peer reviewed)


