1. Introduction

The Tropospheric Ozone Assessment Report (TOAR) activity of the International Global Atmospheric Chemistry (IGAC) organisation (see https://igacproject.org/activities/TOAR) is collecting surface ozone measurements and related data from all over the world in a central database at Forschungszentrum Jülich, Germany. It currently runs in its second phase, TOAR-II 2020-2024, https://igacproject.org/activities/TOAR/TOAR-II (in our context denoted as TOAR V2, 1). The purpose of collecting this data is to provide globally consistent metrics for analyses of health, vegetation, and climate impacts from ozone air pollution. The database is exposed via a REST API and graphical web services which allow users to visualise data and compare them with other data sets and model observations. We collect data from cooperating data centres, harmonise it, and check its quality before adding it to the TOAR database.

The TOAR database infrastructure aims to provide complete coverage of worldwide surface ozone and related measurements. This means that data from several dozen sources (government agencies, research institutions, NGOs) are assembled together. Technically, we distinguish between individually contributed data, which is submitted to us by email or via a shared folder (usually a small number of files) and harvested data, which we obtain from publicly accessible sources (web services, data downloads, or database dumps made available to us). In a few cases in TOAR-I we also received large contributions (e.g. ~1000 files of Japanese ozone monitoring data). Such massive contributions are treated like harvested data in the workflows described below.

This technical guide describes the processing steps applied to the incoming data for the TOAR database. Note that not every processing detail can be described here, because the harmonisation of the many different data sources naturally means that many individual decisions must be taken on a day by day basis. While other data centres often enforce relatively strict rules for data providers and only accept data which has been processed according to their rules, it is a core objective of TOAR to collect data also from world regions with low data coverage and limited data processing capabilities. This implies that, especially for individually contributed data, a lot of communication takes place between the data providers and the TOAR data centre. Responsible persons and data formats may change, metadata profiles can be altered over time, and often very specific questions need to be sorted out with data providers before we can bring new data online. Nevertheless, we have tried to structure, organise and automate our data processing workflow as much as possible, not least to fulfil high standards with respect to the data documentation and reproducibility.

Footnotes

1: TOAR phase I ran 2014-2019; in our context it is TOAR V1