XMC Cat Metadata Concept Definitions

How metadata concepts are defined and used in XMC Cat:
In XMC Cat, scientific metadata schemas are partitioned into independent concepts about the data products described by the metadata cataloged for that data product. Each of these independent concepts is treated as a separate property in the XMC Cat and metadata is added in "concept-sized" blocks of metadata.

The LEAD project uses the Lead Metadata Schema (LMS) which is a profile of the metadata schema developed by the FGDC for spatial metadata. To store metadata in XMC Cat based on the LEAD schema, the schema is partitioned into concepts and these definitions are loaded as data into the XMC Cat. By partitioning the LMS into concepts and storing the definitions of those concepts as data, XMC Cat is not tightly coupled to a specific metadata schema but can instead be easily adapted to different scientific schemas.
In the LEAD project we found that much of the metadata we wanted to capture about data products to later assist in discovery was domain-specific and not described by the LMS. However, the FGDC schema contains an Entity and Attribute section that allows for the description of domain-specific attributes of the data. In the documentation for the FGDC schema it describes entities such as roads which may then have certain attributes (keep in mind that the FGDC is designed for spatial metadata). In the case of the LEAD project, much of the domain-specific metadata is model configuration parameters used in Fortran namelist configuration files and critical notifications generated by the workflow engine about experiments as they are executed.

The definitions for these metadata properties are loaded into XMC Cat as data that is stored in tables used to define the metadata properties (and broader categories they are grouped into) as well as the individual metadata elements. This page contains the files used to load the definitions into the XMC Cat's database. Each of the files listed below contain the definitions for a set of properties, sub-properties, and metadata elements within those properties. A categorized list of all of the concepts used in LEAD can be found here.

Definitions For Last Updated
Based on LMS StructureLoaded with DB
NotificationsAugust 2007
Cross-Cutting ParametersLoaded with DB
ApplicationSoon!
Forecasting Model Namelist ParametersSummer 2007
Seige Workflow MetadataSpring-Summer 2007
NetCDF Spatial and Temporal MetadataSummer 2008
Internal IDs:
A range of internal IDs is specified for each of the categories of metadata properties listed below. These IDs have no particular significance from a user perspective (and are never displayed to the user) but are listed here for reference purposes. For each category of metadata, there are considerably more metadata elements (e.g., center latitude) than there are metadata properties, but the range of IDs used applies to both properties and the metadata elements within those properties. Generally only a small portion of the range specified is used currently for each category of metadata.

Metadata Concepts Based on the LMS Schema Structure

Within XMC Cat each metadata property is assigned a unique internal ID used within XMC Cat, but this has no relevance to users of the metadata catalog. From a user's perspective, there are two vlaues that together define a property as being unique - the name of the property and its source. The source indicates who defines the metadata property and is generally a URI. However, for those metadata properties defined based on partitioning the LMS into independent concepts, the source used is "LEAD". Since metadata properties cna have a complex structure (not just name/value pairs) a property can also contain sub-properties. The sub-properties are uniquely defined based on their name, source, and parent property since the same sub-property may be applicable to multiple properties.

For LEAD, these definitions are loaded in the database script used to create the relational database. The concepts are grouped into the following categories:
General Information
Keywords
Temporal Metadata
Spatial Metadat
Distribution Metadata
Data Quality - Provenance Metadata
internal ID range: 1 - 100000

Metadata Concepts Based on Critical Notification Messages in LEAD

As workflows are executed in LEAD, the workflow engine will generate notifications regarding critical events about the workflow. In LEAD there is a separate component named the LEAD Agent That listens for these critical notifications and makes a request to the XMC Cat to add the notification as metadata for the related experiment. The "detailed" element from the LMS based on the FGDC schema is used to record these notifications.
Notification Metadata Database Script
internal ID range:100001 - 109999

Metadata Concepts for Cross-Cutting Model Parameters

In the forecasting models used in LEAD, there are a number of configuration parameters that apply across multiple workflow components when running an experiment and these must match across the various components. In LEAD these parameters are configured in the LEAD Portal when a user is defining an experiment, and the portal registers these parameters with the XMC Cat when a user initiates an experiment. The "detailed" element from the LMS based on the FGDC schema is used to record these configuration parameters.

For LEAD, the cross-cutting parameter definitions are loaded in the database script used to create the relational database.
Cross-Cutting Model Configuration Parameters
internal ID range: 110001 - 119999

Metadata Concepts for Applications

The forecasting models used in LEAD evolve over time as meteorological researchers refine and enhance their models. For discovery purposes when scientists are searching their workspace it is important to know the name and version of the forecasting model used to generate the output. We capture metadata regarding the name and version of the application used to generate experiment outputs:
Application Metadata Database Script.

Metadata Concepts for Forecasting Model Namelist Parameters

FORTRAN is used extensively in scientific computing, and the LEAD project has wrapped these FORTRAN programs as services that are used in LEAD workflows. As a workflow executes, the FORTRAN namelist files that are used to configure the models are parsed and the parameters are recorded as metadata of the experiment. The parameters in a namelist file are grouped into blocks of related parameters and each block is a property in XMC Cat. Following are the properties and metadata elements recorded for each type of namelist file:

ARPS to WRF Parameters Database Script (internal ID range: 140001 - 149999)
Lateral Parameters Database Script (internal ID range: 150001 - 159999)
Terrain Parameters Database Script (internal ID range: 130001 - 139999)
WPS Parameters Database Script (internal ID range: 170001 - 179999)
WRF Parameters Database Script (internal ID range: 120001 - 129999)
WRF Static Parameters Database Script (internal ID range: 160001 - 169999)

Metadata Concepts for Seige Workflows

One of the institutions that is a partner in the LEAD project is NCSA and these properties are used to capture metadata regarding data products generated by the Seige workflow engine.
Seige Workflow Metadata Properties
internal ID range: 300001 - 309999

Metadata Concepts for NetCDF Temporal and Spatial Properties

The output generated by the WRF forecasting model is in a NetCDF format and may consist of a single file or a set of files. For post-processing the scientists using LEAD need detailed temporal and spatial metadata about each file that is beyond the spatial and temporal metadata recorded at the experiment level. Going forward this may be expanded to include additional metadata contained in the NetCDF headers of these output files.
NetCDF Temporal Metadata Properties NetCDF Spatial Metadata Properties Database Script
internal ID range: 310001 - 319999