3. TOAR Database user

3.1. What is the TOAR database about and what are TOAR data services?

The TOAR database is a central data repository for global data from surface ozone and ozone precursor measurements. It supports the Tropospheric Ozone Assessment Report{target=_blank} by enabling scientists around the world to perform standardized analysis of ozone-related data. The TOAR database is operated by the Jülich Supercomputing Centre at Forschungszentrum Jülich, Germany. Users can not directly access the database itself, but we offer various web services (APIs) which allow everyone to search for data, download data, visualize data, or perform online analysis like data aggregations or trend analysis. The latter uses the quantile regression approach that has been agreed upon by the TOAR science community. All TOAR database-related services and extensive documentation can be accessed through our home page{target=_blank}.

3.2. From where do you get the TOAR data?

The TOAR database team collected data from 18 large air quality monitoring networks and from many individual data providers. Some of these datasets are available through public data services, others were sent to us by individual data providers. In all cases, we took great care to ensure that the data are properly licensed and that we are allowed to re-distribute the data under a permissive Creative Commons CC-BY4.0{target=_blank} license. All data in the TOAR database are harmonized and quality controlled, but the level of quality control differs (see questions on data curation and data quality). Only data from research-grade instruments is accepted. To enable the analysis of ozone change causes, we also extracted time series from the global ERA5 reanalysis{target=_blank} by the European Centre for Medium-Range Weather Forecasts at all station locations in the database.

3.3. What is special about the TOAR database?

The TOAR database is one of the largest collections of ozone-related surface measurements world-wide, it is fully committed to Open Data and FAIR principles, and it is accessible through a variety of modern, user-friendly, and performant web services. Another unique selling point of the TOAR database is its very rich set of metadata, which in particular allows for a globally harmonized characterisation of measurement sites based on a set of pre-processed Earth Observation datasets. For details, please consult the metadata reference{target=_blank}.

3.4. How much data is in the TOAR database?

The TOAR database contains more than 430,000 time series at almost 24,000 stations. There are about 65 billion observation records ranging from the 1970s to the recent past (most data records end in 2022 or 2023 as this is the common analysis period of the TOAR-II assessment). For German stations, we collect and provide near-realtime data with updates 4-times daily. The total data volume of the TOAR database is close to 10 Terabytes.

3.5. How can I find data in the TOAR database?

All our services are offered through our home page{target=_blank}. The easiest way to identify the most suitable data for your purpose is probably our dashboard{target=_blank}. We also offer a search endpoint{target=_blank} at our REST API, which you can use in your browser or through your own programs. Example Python programs how to use the search endpoint are provided in our TOAR tools{target=_blank} repository.

3.6. What do you do in terms of data curation?

Data curation consists of a first manual inspection to check if the data can be processed at all and adheres to our metadata definitions and data format description (see The TOAR Data Processing Workflow{target=_blank} for details). The following automated processing workflow contains several checks and quality tests including statistical tests to detect large discrepancies to expected values. Once a dataset passes these tests it will be inserted into the database where providers and the TOAR data team can visualize and inspect the data again. Other curation steps include the harmonization of metadata information and the augmentation of metadata through processed Earth Observation information (see question on global metadata). Finally, data errors are often found when visualizing or analyzing data from the TOAR database. Users can point us to obvious data errors and we will correct them as quickly as possible. All changes to data and metadata are logged and can therefore be re-traced if needed. Major data changes will also induce a new dataset version number.

3.7. What is the quality of the data?

TOAR only accepts data from research grade instruments and relies on quality control exerted by the data provider (monitoring agency, scientific institution or other). Nevertheless, data processing errors and other factors can lead to errors in the data that is stored in the database. We try to identify such errors through an automated quality control tool and, in some cases, through manual inspection. Furthermore, preliminary analyses of TOAR data for the scientific papers produced in TOAR-II will identify data errors and we implemented a feedback function so that users can alert us to obvious or likely data errors. It is impossible to guarantee the correctness of all TOAR data, but we take data quality serious and do our best to achieve the maximum possible quality of the data in the TOAR database. Since the focus of TOAR is on tropospheric ozone, the data quality of ozone is likely better than that of ozone precursor species. The quality of the meteorological data from ERA5 can be assessed through the ERA5 validation report{target=_blank}.

3.8. Can I filter data of specific quality levels?

Yes. The data flagging scheme of the TOAR database allows to distinguish between quality flags set by the data provider and data quality flags assigned from our automated quality control tool or visual inspection of the TOAR data team. Details on how to specify the desired data quality level can be found in TOAR Near Realtime Data Processing{target=_blank}.

3.9. What is this thing about “global metadata”?

The TOAR database version 2 has an extensive metadata schema which includes a lot of information about the measurement location (station) and the measurement itself (timeseries). In addition, we generate globally uniform metadata through a set of queries to our Geospatial Point Extraction and Aggregation Service (GEO-PEAS). GEO-PEAS has copies of several Earth Observation data sets with spatial resolution on the order of 100 m. These include, for example, the Human Settlement Layer database from JRC Ispra, the NOAA stable nightlight dataset, and the ESA CCI landcover product. GEO-PEAS then allows to calculate aggregated quantities from this data in a user-defined radius around a location, e.g., a measurement site. A predefined set of such aggregates (for example, maximum population density within 25 km radius) is stored together with the station metadata in the TOAR database and can be used to filter the stations that you want to cinlude in your analysis. A detailed description of the global metadata elements can be found in https://esde.pages.jsc.fz-juelich.de/toar-data/toardb_fastapi/docs/toardb_fastapi.html#stationmetaglobal{target=_blank}.

3.10. What is the time resolution of the data in the TOAR database?

The TOAR database stores timeseries data in hourly resolution or finer. Most data have been collected as hourly data, but the database can also work with half-hourly, 15-minute or 10-minute data. Note that all TOAR analyses are based on hourly data and hour statistics and aggregation service{target=_blank} has been specifically developed for this time resolution.

3.11. Does the database also contain data from mobile platforms?

No. This would require a different data model. The TOAR working group on marine and polar ozone has, however, established a collection of measurements from mobile platforms, which is available from here{target=_blank}. Please make sure to cite the associated publication when using this data collection.

3.12. How do I retrieve data?

We offer different web services{target=_blank} through which you can download either hourly values, aggregated statistics (means, min/max, percentiles, and several ozone-specific metrics), or trend estimates. There is also software{target=_blank} available to generate gridded data products, e.g. for the evaluation of chemistry transport models. Due to technical limitations, the amount of data that can be retrieved is limited. Registered users (see below) can issue larger download requests than anonymous users, and for special needs we can increase your quote upon request.

3.13. Who can become a registered user?

The TOAR database is open to everyone with or without registration. Anonymous use of our web services is possible, but the amount of data that you can process and retrieve are limited. If you register (use the button “Register as a TOAR user” within your profile after having logged in via the dashboard{target=_blank}), you will be able to save your preferences, keep a history of your data processing requests, and you will be able to process larger requests and download more data. According to the CC-BY4.0 license{target=_blank}, there are no use restrictions on our data, but you will be obliged to properly acknowledge the data sources. Please see citations and attributions{target=_blank} to learn how we support you with this.

3.14. What is the data format of data downloads?

Any query to the TOAR database REST API returns a JSON structure per default. Data queries can also be returned as csv file. Therefore, retrieved data can easily be processed with a Python script. Since JSON is a standardized human readable format, you can of course interpret the data with any tool of your choice. To speed up processing, some of our services allow you to specify the fields that you want to include in the metadata output. Choosing these wisely can significantly reduce processing time and data volumes. The TOAR gridding software generates netCDF files from your data queries.

3.15. Under which conditions may I use the data?

All TOAR-II data are provided without restrictions under a CC-BY 4.0 licence{target=_blank}. This licence requires that you acknowledge the data source. Each response to a TOAR data query via the REST API contains a citation and acknowledgement metadata element. It is your responsibility as data user to make sure that proper acknowledgements are given. See also data use policy{target=_blank}. We have developed a specific data service (https://toar-data.fz-juelich.de/api/v2/#citations-and-attributions{target=_blank}) that assists you in finding the right information in your data citations.

3.16. How can I cite the TOAR database?

The TOAR database should be cited as Schröder et al; The TOAR-II database (in preparation) hopefully not too much longer

For individual data series or data collections, the original data sources should be cited. A recommended citation is provided with the metadata when data are downloaded and can be obtained by sending a query to the citation endpoint. Please make sure to at least acknowledge our data providers. You will also find a suggested acknowledgement string in the query responses.

3.17. Do you make use of controlled vocabulary?

Yes. You can retrieve the ontology of TOAR data as xml from https://toar-data.fz-juelich.de/api/v2/ontology{target=_blank} or you can have see it online as OWL document via https://toar-data.fz-juelich.de/documentation/ontologies/v1.0/{target=_blank}.

3.18. What is the database schema underlying the TOAR database?

You can check out our git pages{target=_blank} documentation for details.

3.19. Where can I get further support?

For further questions please send an email to support@toar-data.org.