XMC Cat Metadata Concept Definitions
How metadata concepts are defined and used in XMC Cat:
In XMC Cat, scientific metadata schemas are partitioned into independent concepts about
the data products described by the metadata cataloged for that data product. Each of these
independent concepts is treated as a separate property in the XMC Cat and metadata is added
in "concept-sized" blocks of metadata.
The LEAD project uses the Lead Metadata Schema (LMS) which is a profile of the metadata schema developed
by the FGDC for spatial metadata. To store metadata in XMC Cat based on the LEAD schema, the schema is
partitioned into concepts and these definitions are loaded as data into the XMC Cat. By partitioning the
LMS into concepts and storing the definitions of those concepts as data, XMC Cat is not tightly coupled
to a specific metadata schema but can instead be easily adapted to different scientific schemas.
In the LEAD project we found that much of the metadata we wanted to capture about data products to later
assist in discovery was domain-specific and not described by the LMS. However, the FGDC schema contains
an Entity and Attribute section that allows for the description of domain-specific attributes of the data.
In the documentation for the FGDC schema it describes entities such as roads which may then have certain
attributes (keep in mind that the FGDC is designed for spatial metadata). In the case of the LEAD project,
much of the domain-specific metadata is model configuration parameters used in Fortran namelist configuration
files and critical notifications generated by the workflow engine about experiments as they are executed.
The definitions for these metadata properties are loaded into XMC Cat as data that is stored in tables used to
define the metadata properties (and broader categories they are grouped into) as well as the individual
metadata elements. This page contains the files used to load the definitions into the XMC Cat's database. Each
of the files listed below contain the definitions for a set of properties, sub-properties, and metadata elements
within those properties. A categorized list of all of the concepts used in LEAD can be found here.
| |
Internal IDs:
A range of internal IDs is specified for each of the categories of metadata properties listed below. These IDs have
no particular significance from a user perspective (and are never displayed to the user) but are listed here for
reference purposes. For each category of metadata, there are considerably more metadata elements (e.g., center latitude)
than there are metadata properties, but the range of IDs used applies to both properties
and the metadata elements within those properties. Generally only a small portion of the range specified is
used currently for each category of metadata.
Metadata Concepts Based on the LMS Schema Structure
Within XMC Cat each metadata property is assigned a unique internal ID used within XMC Cat, but this has no relevance to users
of the metadata catalog. From a user's perspective, there are two vlaues that together define a property as being unique - the
name of the property and its source. The source indicates who defines the metadata property and is generally a URI. However, for
those metadata properties defined based on partitioning the LMS into independent concepts, the source used is "LEAD". Since
metadata properties cna have a complex structure (not just name/value pairs) a property can also contain sub-properties. The
sub-properties are uniquely defined based on their name, source, and parent property since the same sub-property may be applicable
to multiple properties.
For LEAD, these definitions are loaded in the database script used to create the relational database. The concepts are grouped into the
following categories:
General Information
Keywords
Temporal Metadata
Spatial Metadat
Distribution Metadata
Data Quality - Provenance Metadata
internal ID range: 1 - 100000
Metadata Concepts Based on Critical Notification Messages in LEAD
As workflows are executed in LEAD, the workflow engine will generate notifications regarding critical events about the workflow. In
LEAD there is a separate component named the
LEAD Agent That listens for these critical notifications and makes a
request to the XMC Cat to add the notification as metadata for the related experiment. The "detailed" element from the LMS based on the
FGDC schema is used to record these notifications.
Notification Metadata Database Script
internal ID range:100001 - 109999
Metadata Concepts for Cross-Cutting Model Parameters
In the forecasting models used in LEAD, there are a number of configuration parameters that apply across multiple workflow components
when running an experiment and these must match across the various components. In LEAD these parameters are configured in the LEAD
Portal when a user is defining an experiment, and the portal registers these parameters with the XMC Cat when a user initiates an experiment.
The "detailed" element from the LMS based on the FGDC schema is used to record these configuration parameters.
For LEAD, the cross-cutting parameter definitions are loaded in the database script used to create the relational database.
Cross-Cutting Model Configuration Parameters
internal ID range: 110001 - 119999
Metadata Concepts for Applications
The forecasting models used in LEAD evolve over time as meteorological researchers refine and enhance their models. For discovery purposes
when scientists are searching their workspace it is important to know the name and version of the forecasting model used to generate the
output. We capture metadata regarding the name and version of the application used to generate experiment outputs:
Application Metadata Database Script.
Metadata Concepts for Forecasting Model Namelist Parameters
FORTRAN is used extensively in scientific computing, and the LEAD project has wrapped these FORTRAN programs as services that are used in LEAD
workflows. As a workflow executes, the FORTRAN namelist files that are used to configure the models are parsed and the parameters are recorded
as metadata of the experiment. The parameters in a namelist file are grouped into blocks of related parameters and each block is a property in
XMC Cat. Following are the properties and metadata elements recorded for each type of namelist file:
ARPS to WRF Parameters Database Script (internal ID range: 140001 - 149999)
Lateral Parameters Database Script (internal ID range: 150001 - 159999)
Terrain Parameters Database Script (internal ID range: 130001 - 139999)
WPS Parameters Database Script (internal ID range: 170001 - 179999)
WRF Parameters Database Script (internal ID range: 120001 - 129999)
WRF Static Parameters Database Script (internal ID range: 160001 - 169999)
Metadata Concepts for Seige Workflows
One of the institutions that is a partner in the LEAD project is NCSA and these properties are used to capture metadata regarding data products
generated by the Seige workflow engine.
Seige Workflow Metadata Properties
internal ID range: 300001 - 309999
Metadata Concepts for NetCDF Temporal and Spatial Properties
The output generated by the WRF forecasting model is in a NetCDF format and may consist of a single file
or a set of files. For post-processing the scientists using LEAD need detailed temporal and spatial
metadata about each file that is beyond the spatial and temporal metadata recorded at the experiment level.
Going forward this may be expanded to include additional metadata contained in the NetCDF headers of these
output files.
NetCDF Temporal Metadata Properties
NetCDF Spatial Metadata Properties
Database Script
internal ID range: 310001 - 319999