3. TOAR Data Submission Format

3.1. Filename

Please provide data files with a name containing the species, the station_id, and the period of the data record in the format:

{parameter}_{station_id}_{startyear}{startmonth}_{endyear}{endmonth}_{special}.{extension}

Parameter should be written in lower case. startyear and endyear consist of 4 digits, startmonth and endmonth of 2 digits. The special tag is optional and can be used to identify, for example, wind sector filtered data or data from different sampling heights.

There is no need to break your data into individual years, but if it is more convenient for you to submit annual files, then please do so.

Examples:
1)   o3_UN4058943_201901_201912.dat
2)   wspeed_DEHR0003_200001_202012_cosmomodel.csv

3.2. File Header and Metadata Rules

Please provide as much of the following metadata information as possible. Metadata keys that are marked with ‘*’ are mandatory; we will not be able to process your data if any of these elements is missing. It is possible, however, to provide the mandatory station metadata in a separate stations.csv file if this is easier for you.

Metadata shall be formatted as key-value pairs separated by a colon (example: Station_id: USH54S). Line breaks in the metadata values are not allowed. You can start the header lines with metadata information with a comment symbol (‘#’, ‘*’ or ‘!’) or simply begin the line with the name of the metadata key. Starting the header line with any other character will make it invalid and prevents processing. Parsing of metadata keys is case-insensitive, so it doesn’t matter if you use lowercase, uppercase or mixed-case characters. String formatted metadata values will however preserve their format. The order of the metadata elements does not matter, but we suggest that you stick to the order from Table 2 below. We suggest that you copy the template metadata header from the Annex of this document and edit the content of the values. If you cannot provide a given piece of information, simply delete the line.

You can also add additional metadata key value pairs in your files. These will be preserved in the TOAR database as additional_metadata in the station or timeseries records. Additional metadata can be retrieved from the database but is not available for data set searches. Some suggestions for recommended additional metadata are listed in Table 3. Please help us by starting any additional metadata key with station information with ‘station/_’. And please avoid lines starting with ‘Time,’ as we use this keyword to identify the start of the data section.

Empty lines in the file header are allowed and will be ignored.

Table 3.1 Valid header line examples

Header line

Comment

Station_name: Niederzier, Treibbachstraße

Valid key, separator ‚:‘, valid text, all in one line

# Station_name: Niederzier, Treibbachstraße

As alternative to the line above; it starts with a comment symbol

!station_lon : 6.469312

Correctly formatted, longitude given as decimal degrees east.

Station_geographic_context: mountain range

Valid key value pair. As there is no metadata element “station_geographic_context” in the TOAR database scheme, this metadata element will be saved as additional_metadata.

Table 3.2 TOAR file header and description of all metadata elements 1

Metadata key

Data type / allowed values

Description

*Station_id (or station_code)

string

The station code as the station is registered in your network or as it shall be registered in the TOAR database. Don’t use blanks or special characters in station codes. Exception ‘-‘ (US AQS codes, for example). Example: fr05237

Additional information on “role codes” (Dataset_PI, Contributor, Collaborator, PointOfContact):

These terms are a subset of role codes that have been defined by ISO 19115 to standardise information processing. The explanations given for these role codes are rather vague. In the context of the TOAR data processing, we define these roles as follows:

Dataset_PI

the principal investigator of a measurement. This is the person who is responsible for making the measurements and securing the quality of the data. In general, there should be exactly one Dataset_PI associated with every measurement. The Dataset_PI may delegate responsibilities, for example to technicians or postdoctoral researchers, and yet remain PI as the person overseeing the measurements and data distribution.

Collaborator

a person who has been involved in making the measurements or processing the data, but who is either not part of the institution responsible for the measurement or who has “contributed” only temporarily. One situation we have encountered in TOAR, where nomination of collaborators makes sense is when university researchers assist government agencies in preparing their data for submission to the TOAR database.

Contributor

this role applies to any person who is involved in making the measurements or processing the data. Normally, the Dataset_PI will decide who shall be listed as contributor. The distinction between contributor and collaborator is not very clear, but if you wish to distinguish between people who were involved on a project level (collaborator) and those who work with you more permanently (contributor), then you can make use of these different roles.

PointOfContact

one person dedicated to answer questions related to the dataset, either by the TOAR data team or by data users. The declaration as PointOfContact is independent from the role as Dataset_PI, Contributor, or Collaborator.

The TOAR database can distinguish between roles concerning the measurement station and roles concerning the measurement itself, but this is not reflected in the file header template. If you wish to provide us the information about the responsible persons for the operation of the station, then either send this information by email (for individual sites) or create a stations.csv file where you collect all the metadata concerning the site(s) of your measurements. In the stations.csv file you can use the same format to define roles as in the data file headers.

Table 3.3 Recommended key names for additional metadata[#f32]_

Metadata key

Data type / suggested values

Description

Sampling_type

string (one of: continuous filter flask)

Describes the sampling mode of your measurement device

3.3. Data Format

The data section of your data files shall always start with the title line Time, [Variable name], Flag. If your data doesn’t contain data quality flags, we also accept the header Time, [Variable name]. Of course you should replace Variable name with the actual name of the variable you send to us. Please take the correct spelling of variable names from the REST API at https://toar-data.fz-juelich.de/api/v2/variables/ (e.g. ‘o3’ instead of ‘ozone’). We recommend that you insert an empty line between the file header and the beginning of the data section to increase readability (see the example in the Annex Section 5).

Data should always be provided in chronological order. The actual data section should contain one line per “possible hour” in the year, i.e. a year with 365 days shall have 8760 data lines and a leap year 8784 data lines. Missing values should be coded with -9999. If it is easier for you to not report missing data at all, you can simply omit the lines with missing values, but then please don’t report any missing data at all.

As stated in Section 3.1, you can either send individual files per year or combine the data from multiple years in one file.

Please stick to the formats described in Table 3.4 below.

Table 3.4 Formatting instructions for the data section in TOAR data files

Data

Format

Comments

Time

YYYY-MM-DD hh:mm

e.g. 2010-01-01 00:00 with hours starting at 00:00 and denote the beginning of the 1-hour averaging period

Value

Floating point number with greater equal 2 decimals

Make sure that the unit corresponds to the “Original_units” you specified in the header! Missing data should be labelled with a large negative number consisting of at least four ‘9’s. We suggest to use -9999.

Flag

1-digit integer or other numeric value

The “flag” column is optional. We suggest to make use of the following flag values from WMO code table 0 33 020:

0: OK
2: doubtful
3: wrong
7: missing value

If you use a different flagging scheme, please let us know the meaning of the flag values. We will generally be able to translate them.

If you don’t provide flag values, we will assume that all data except for values of -9999 are valid measurements.

Footnotes

1

Please see the Annex for a template header which you can copy into your data files and edit the values. We recommend that you also read the metadata-reference to better understand the meaning of the various metadata attributes in this table. For an explanation of the meanings of Dataset_PI, Contributor, Collaborator, and PointofContact, please see the text below the table.

2

Note that the TOAR database uses its own versioning scheme as described in subsection-timeseries-version. We will, however, preserve your version labels as informational metadata

3

These metadata elements will not be available for data search operations and they will be stored as is (except for conversion of key names to lowercase). All of these elements are optional. You can provide other information in different key-value pairs as well.

4

A G Hearn 1961 Proc. Phys. Soc. 78 932