6. FAIR Data

This section provides a self-assessment of the level of FAIRness that has been accomplished by the TOAR data infrastructure and services. The main components of the TOAR data infrastructure are a relational database housing the data together with its metadata, a REST API and a graphical user interface to access the data, and a publication service preparing data sets to be published in the B2SHARE service.

The FAIRness requirements are taken from GO FAIR (https://www.go-fair.org/fair-principles/) and the assessment is influenced by the common set of core assessment criteria for FAIRness developed by the RDA FAIR data maturity model Working group (https://www.rd-alliance.org/groups/fair-data-maturity-model-wg).

6.1. Overview

FAIRness evaluates openness and interoperability of data according to the four main criteria “findable”, “accessible”, “interoperable”, and “re-usable”. The following table lists the GO FAIR requirements and summarizes our self-assessment how far the TOAR data infrastructure is matching these criteria.

Table 6.1 FAIRness Self Assessment

To Be Findable

F1. (Meta)data are assigned globally unique and persistent identifiers | 100%

F2. Data are described with rich metadata

100%

F3. Metadata clearly and explicitly include the identifier of the data they describe

100%

F4. (Meta)data are registered or indexed in a searchable resource

75%

To Be Accessible

A1. (Meta)data are retrievable by their identifier using a standardised communication protocol | 75%

A1.1 The protocol is open, free and universally implementable

75%

A1.2 The protocol allows for an authentication and authorisation where necessary

75%

A2. Metadata should be accessible even when the data is no longer available

75%

To Be Interoperable

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation | 75%

I2. (Meta)data use vocabularies that follow the FAIR principles | 75%

I3. (Meta)data include qualified references to other (meta)data

75%

To Be Resuable

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

75%

R1.1. (Meta)data are released with a clear and accessible data usage license

75%

R1.2. (Meta)data are associated with detailed provenance

75%

R1.3. (Meta)data meet domain-relevant community standards

75%

6.2. Discussion

In the following we discuss the FAIRness requirements one by one.

F1: (Meta)data are assigned globally unique and persistent identifiers

The database itself is registered with re3data.org and with that has a globally unique DOI provided by DataCite (https://www.datacite.org/, TOAR: http://doi.org/10.17616/R3FZ0G). The metadata describing the database is available with the same DOI.

Data with its metadata from individual data providers, which are published on B2SHARE have globally unique DOIs from DataCite assigned to them. Every instrument time series is published as an individual data record, and all time series belonging to one station are grouped as a collection. The DOI of the collection shall be used as the primary DOI to identify and reference a dataset.

Currently, the data contained in the TOAR database as well as in the published data at B2SAHRE are time series data. Once other datasets (vertical profiles, satellite retrievals, model (gridded) data) are added, a similar concept will be applied.

Data retrieved from other sources, e.g. data replicated from large environmental data archives, are assigned a unique identifier within our database. These data can be unambiguously identified through a combination of human-readable metadata attributes (station_id, variable_id resource_provider, version, data_origin, measurement_method or model_experiment_identifier, sampling height, data_filtering_procedures (processing step 14, Criterion 14.1 - Criterion 14.9).

The original unique identifiers of replicated datasets are preserved as metadata attributes in the TOAR database if they are available and accessible. This allows for back-referencing to the original data source.

F2: Data are described with rich metadata

The metadata describing the TOAR database in the re3data.org registry follows the re3data requirements while the metadata of data publications in B2SHARE complies with the requirements of B2SHARE and DataCite.

The data in the TOAR database has a rich metadata profile covering most aspects of provider information, location description, instrument description, data quality and version information. A highlight of the TOAR database is the ability to preserve additional metadata information from providers, which cannot be mapped to the harmonised TOAR metadata profile. For details see TOAR metadata documentation: Section 4 above and http://esde.pages.jsc.fz-juelich.de/toar-data/toardb_fastapi/docs/toardb_fastapi.html#models.

F3: Metadata clearly and explicitly include the identifier of the data they describe

The metadata provided for the TOAR database at re3data.org contains the link to the user interfaces of the database. The metadata available for data publications of the TOAR community in B2SHARE contain the links to the data sets contained in the data collection in the form of DOI of the collection/PID of the data set.

The TOAR database’s data and metadata are never separated, ensuring a clear mapping of the metadata to the data they describe.

F4. (Meta)data are registered or indexed in a searchable resource

Through the registration in re3data.org the TOAR database is indexed and thereby searchable. TOAR data publications on B2SHARE are indexed in b2find.eudat.eu and with that searchable.

A1: (Meta)data are retrievable by their identifier using a standardised communication protocol

We use https (with REST) for (meta)data retrieval, which is a standardized communication protocol. The REST-API allows for data being accessed automatically.

A1.1 The protocol is open, free and universally implementable

https (with REST) is open, free and universally implementable.

A1.2 The protocol allows for an authentication and authorisation where necessary

https allows for an authentication and authorisation where necessary.

A2. Metadata should be accessible even when the data is no longer available

Metadata of the TOAR database in re3data.org as well as those of data publications in B2SHARE / B2FIND will be kept persistently according to the respective policies of the service organisations. In the TOAR database itself, data and metadata are contained in the same physical space. Efforts are taken to keep the (meta)data persistently.

I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

B2SHARE data publications use an extension of the Dublin Core Schema for the metadata, while DataCite developed a custom metadata scheme 1.

The TOAR metadata uses

  1. commonly used controlled vocabularies (e.g. adapted from IPCC 2, MODIS CMG 3, HTAP 4, …), represented in an ontology and

  2. a good data model (a well-defined framework to describe and structure metadata).

The TOAR ontology uses OWL and SKOS and can also be provided as RDF or JSON-LD. The TOAR REST API provides data and metadata within a JSON structure, that is broadly usable in python scripts.

I2: (Meta)data use vocabularies that follow the FAIR principles

The TOAR metadata scheme has been built from existing standards (e.g. ISO 19115 “geographic information- metadata”) and is accessible at http://esde.pages.jsc.fz-juelich.de/toar-data/toardb_fastapi/docs/toardb_fastapi.html. The ontology can be browsed at https://toar-data.fz-juelich.de/api/v2/onloglogy

Currently, the controlled vocabulary used in the metadata fields has been defined and is covered by the ontology, e.g. the terms for the type of area a station is located in which are urban, suburban, rural and unknown. They are not published and accessible through a globally unique identifier but accessible from the webpage given above. The identifiers of the metadata have been defined with the TOAR metadata scheme at http://esde.pages.jsc.fz-juelich.de/toar-data/toardb_fastapi/docs/toardb_fastapi.html.

I3: (Meta)data include qualified references to other (meta)data

Within the TOAR data publications on B2SHARE, metadata on individual time series are linked to the respective collections and vice versa, given their unique DOI.

Currently it is planned to link the TOAR metadata for contact persons with their ORCID and organisations with their web link. The development is ongoing. The ontology already links term definitions to their source and where data are replicated from other repositories, the metadata includes a reference to the original data repository, pointing specifically to the original metadata. Further links can be stored in the auxiliary metadata.

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

Besides the general metadata provided with re3data.org for the TOAR database the database has a rich metadata profile covering most aspects of provider information, location description, instrument description, data quality and versioning information. A highlight of the TOAR database is the ability to preserve additional metadata information from providers, which cannot be mapped to the harmonized TOAR metadata profile. The metadata profile is available at http://esde.pages.jsc.fz-juelich.de/toar-data/toardb_fastapi/docs/toardb_fastapi.html.

R1.1. (Meta)data are released with a clear and accessible data usage license

TOAR data publications on B2SHARE always come with a CC-BY (4.0) license. Clear display and easy access to this license is a feature of B2SHARE.

Replicated data (or other datasets which are not published on B2SHARE) from TOAR data providers are also available under the CC-BY license.

R1.2. (Meta)data are associated with detailed provenance

The TOAR data ingestion and data publication workflow is clearly documented (refer to TOAR Data Processing). The source of the data is part of the metadata as detailed in Section 4.3 above.

All processing steps from receipt of the original data to the data publication in the TOAR database and/or as B2SHARE record are documented and could be made available on request. Changes to the data in the TOAR database are automatically logged in the changelog which is part of the metadata.

R1.3: (Meta)data meet domain-relevant community standards

As discussed above (I1 and I2), we use ontologies and controlled vocabulary based on ISO-19115 and the WIGOS standard wherever possible. A standard which covers all necessary aspects of the TOAR-II activity does not exist yet. The TOAR data team follows the developments / refinements of community metadata standards as undertaken for example by the German national research data infrastructure (NFDI) initiative or the the European ENVRI-FAIR project.

The data is provided in csv, html, and json format; a NetCDF output format will also soon be available.

Footnotes

1

http://schema.datacite.org/meta/kernel-4/doc/DataCite-MetadataKernel_v4.4.pdf

2

Intergovernmental Panel on Climate Change

3

Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Climate Modeling Grid (CMG) (MCD12C1) Version 6 data product (https://lpdaac.usgs.gov/products/mcd12q1v006/)

4

Task Force on Hemispheric Transport of Air Pollution (TF HTAP)