Documentation of the TOARDB Analysis FastAPI REST interface

A Representational State Transfer (REST) service that allows retrieval of analysis products from the Tropospheric Ozone Assessment Report (TOAR) database of surface ozone observations.

This documentation describes the URL architecture and query options of the TOAR analysis REST interface. For general information on REST, please consult other resources.

Table of Contents

1. General

1.1 Base URL

https://toar-data.fz-juelich.de/api/v2/analysis/

Response: Description and documentation of available REST services (this document).

1.2 Services

The following analysis services are available and described individually below. Each service is invoked by appending its name and possible query arguments to the base URL.

  • data: get hourly data from the database

    • timeseries: get hourly time series data

    • map: get snapshot of one point in time of one variable

  • statistics: get aggregated data from the database

    • map: get snapshot of aggregated values of one variable

  • trends: get trends of aggregated data from the database

  • status: check the current status of your query

  • result: get the query result

1.3 Query arguments

In order to control the database queries and hence the response of the TOAR analysis REST service, you can add arguments to the service URL. These arguments must adhere to the format <argument_name>=<value>. The first argument is prepended by a ? character, all other arguments are separated by & characters.

1.4 Response format

The response can be either synchronous or asynchronous. If the response is synchronous you will receive the requested result directly. If the response is asynchronous you will not get your requested result but instead a unique task identifier for your request. This id can be used to check the status of your request. When your result is ready the id will redirect you to the requested result. This type of approach is chosen for queries that are expected to take more time to process.

2. Description of services

2.1 Data

The services below grant access to the hourly data of the TOAR database.

2.1.1 Data - Time Series

https://toar-data.fz-juelich.de/api/v2/data/timeseries/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

any combination of query options from both TOARDB REST interface - 2.4 Stationmeta and TOARDB REST interface - 2.5 Timeseries

daterange = <list of two datetimes: date range for which to extract data>

flags = <list of strings: only select data points with the specified quality flags> (for a description of flags and all available flag names see User Guide - 5.2 Data Quality Flags)

format = <string> (json|csv) (default: json)

Response: The query will return a unique task identifier and a link to check the status of your query.

Example: https://toar-data.fz-juelich.de/api/v2/analysis/data/timeseries/?country=DE&variable_id=5&limit=3&daterange=2010-01-01T00:00:00,2020-12-31T23:59:59&flags=AllOK&format=csv

Result: {"task_id":"94e3888a-33f8-4adf-a6d6-4d8627c9ecc0","status":"https://toar-data.fz-juelich.de/api/v2/analysis/status/94e3888a-33f8-4adf-a6d6-4d8627c9ecc0"}

To retrieve the result send a request to the status endpoint with your task identifier. If the result is there you will be redirected. The result will be a zip archive containing one file per time series in the format you have chosen.

2.1.2 Data - Map

https://toar-data.fz-juelich.de/api/v2/data/map/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

datetime = <datetime: date and time for which to extract data>

variable_id = <integer: variable to extract>

bounding_box = <list of four numbers: bounding_box (min_lat,min_lon,max_lat,max_lon) in degrees_north/degrees_east to define a geographical rectangle (do not set anything for global extraction)> (default: None)

format = <string> (json|csv) (default: json)

Response: The query will return tuples of latitude, longitude and value at the location in the specified format.

Example: https://toar-data.fz-juelich.de/api/v2/analysis/data/map/?datetime=2020-07-23T13:00:00&variable_id=5&bounding_box=47.5,6.5,54.5,14.5

Result:

[{"lat":47.81564999946587,"lon":13.03488,"value":68.0628783549876},
{"lat":47.8055555994659,"lon":13.043333,"value":65.96268275498761},
...
{"lat":53.2465,"lon":6.60894,"value":43.978797594987604},
{"lat":52.0918,"lon":6.60537,"value":54.1289075949876}]

2.2 Statistics

All statistics are calculated and reported in the local time without shifts for daylight saving time of the station where the data originated from.

https://toar-data.fz-juelich.de/api/v2/analysis/statistics/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

any combination of query options from both TOARDB REST interface - 2.4 Stationmeta and TOARDB REST interface - 2.5 Timeseries

daterange = <list of two datetimes: date range for which to extract data>

flags = <list of strings: only select data points with the specified quality flags> (for a description of flags and all available flag names see User Guide - 5.2 Data Quality Flags)

sampling = <string: temporal aggregation to use> (for available values see ALLOWED_SAMPLING_VALUES)

statistics = <list of strings: statistics to calculate> (for available values and details see 3. Available Statistics)

seasons = <list of strings: seasons to use for seasonal aggregations> (for available values see SEASON_DICT) (default: "DJF,MAM,JJA,SON")

crops = <list of strings: crops to use for vegseason aggregations> (for available values see ALLOWED_CROPS_VALUES) (default: "wheat,rice")

min_data_capture = <number: minimal fraction of available hourly values in the aggregation interval to report an aggregated value, must be between 0 and 1> (default: 0.75)

metadata_scheme = <string: select how much metadata is returned> (basic|extended|full) (default: full)

format = <string> (raw|by_statistic) (for details on the formats see 4. Aggregated Output Formats) (default: raw)

Response: The query will return a unique task identifier and a link to check the status of your query.

Example: https://toar-data.fz-juelich.de/api/v2/analysis/statistics/?country=DE&variable_id=5&limit=3&daterange=2010-01-01T00:00:00,2020-12-31T23:59:59&flags=AllOK&sampling=annual&statistics=mean,median,min,max

Result: {"task_id":"e2b17c39-6f80-4083-9bb8-f90cd72812b9","status":"https://toar-data.fz-juelich.de/api/v2/analysis/status/e2b17c39-6f80-4083-9bb8-f90cd72812b9"}

To retrieve the result send a request to the status endpoint with your task identifier. If the result is there you will be redirected. The result will be a zip archive containing files in the format you have chosen.

2.2.1 Statistics - Map

https://toar-data.fz-juelich.de/api/v2/statistics/map/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

daterange = <str: comma separated start and end date and time for which to extract data>

variable_id = <integer: variable to extract>

bounding_box = <list of four numbers: bounding_box (min_lat,min_lon,max_lat,max_lon) in degrees_north/degrees_east to define a geographical rectangle (do not set anything for global extraction)> (default: None)

statistics = <list of strings: statistics to calculate> (for available values and details see 3. Available Statistics)

format = <string> (json|csv) (default: json)

Response: The query will return a unique task identifier and a link to check the status of your query.

Example: https://toar-data.fz-juelich.de/api/v2/analysis/statistics/map/?daterange=2010-01-01T00:00:00,2020-12-31T23:59:59&variable_id=5&bounding_box=50,6,52,8&statistics=avgdma8epax&format=csv

Result: {"task_id":"5a0beddf-a1c4-4584-9fb7-d5e98bafcd46","status":"https://toar-data.fz-juelich.de/api/v2/analysis/status/5a0beddf-a1c4-4584-9fb7-d5e98bafcd46"}

To retrieve the result send a request to the status endpoint with your task identifier. If the result is there you will be redirected. The result will be a zip archive containing files in the format you have chosen.

2.4 Status

https://toar-data.fz-juelich.de/api/v2/analysis/status/[task_id]

Response: If the result is not ready yet the response will return the task id and the URL itself again. If the result is ready you will be redirected to the result endpoint.

Example: https://toar-data.fz-juelich.de/api/v2/analysis/status/e2b17c39-6f80-4083-9bb8-f90cd72812b9

Result:

{"task_id":"e2b17c39-6f80-4083-9bb8-f90cd72812b9","status":"https://toar-data.fz-juelich.de/api/v2/analysis/status/e2b17c39-6f80-4083-9bb8-f90cd72812b9"}
or
redirect to https://toar-data.fz-juelich.de/api/v2/analysis/result/e2b17c39-6f80-4083-9bb8-f90cd72812b9

2.5 Result

https://toar-data.fz-juelich.de/api/v2/analysis/result/[task_id]

Response: A zip archive containing the query result in the format you requested.

Example: https://toar-data.fz-juelich.de/api/v2/analysis/result/e2b17c39-6f80-4083-9bb8-f90cd72812b9

Result: zip archive

3. Available Statistics

Remarks about the minimal fraction of available hourly data use 75% (the default) in the descriptions below. When you define a different min_data_capture that value is used instead.

For more details see supplement 1 of Schultz et al. (2017)

Name Description
aot40 Daily 12-h AOT40 values are accumulated using hourly values for the 12-h period from 08:00h until 19:59h. AOT40 is defined as cumulative ozone above 40 ppb. If less than 75% of hourly values (i.e. less than 9 out of 12 hours) are present, the cumulative AOT40 is considered missing. When there exist 75% or greater data capture in the daily 12-h window, the scaling by fractional data capture (ntotal/nvalid) is utilized.
For monthly, seasonal, summer, or annual statistics, the daily AOT40 values are accumulated over the aggregation period and scaled by (ntotal/nvalid) days. If less than 75% of days are valid, the value is considered missing.
avgdma8epax Average value of the daily dma8epax statistics during the aggregation period.
count Number of available values in the aggregation period.
dark_aot40 As aot40, but using solar elevation <= 5 degrees to identify "dark" hours.
dark_avg As mean, but using solar elevation <= 5 degrees to identify "dark" hours.
data_capture Fraction of valid (hourly) values available in the aggregation period.
daylight_aot40 As aot40, but using solar elevation > 5 degrees to identify "daytime" hours.
daylight_avg As mean, but using solar elevation > 5 degrees to identify "daytime" hours.
daytime_avg Daytime average is defined as average of hourly values for the 12-h period from 08:00h to 19:59h. All hourly values in the aggregation period are averaged, and the resulting value is valid if at least 75% of hourly values are present.
diurnal_cycle Diurnal cycle (must be given without any other statistics).
dma8epa Daily maximum 8-hour average statistics according to the US EPA definition. 8-hour averages are calculated for 24 bins starting at 0 h local time. The 8-h running mean for a particular hour is calculated on the concentration for that hour plus the following 7 hours. If less than 75% of data are present (i.e. less than 6 hours), the average is considered missing.
When the aggregation period is "seasonal", "summer", or "annual", the 4th highest daily 8-hour maximum of the aggregation period will be computed.
Note that in contrast to the official EPA definition, a daily value is considered valid if at least one 8-hour average is valid.
dma8epa_strict As dma8epa, but additionally, a diurnal 8-hour maximum value is only saved if at least 18 out of the 24 8-hour averages are valid. This is the official dma8epa definition.
dma8epax As dma8epa, but using the new US EPA definition of the daily 8-hour window from 7 h local time to 23 h local time.
dma8epax_strict As dma8epax, but additionally, a diurnal 8-hour maximum value is only saved if at least 13 out of the 17 8-hour averages are valid. This is the official dma8epax definition.
dma8eu As dma8epa, but using the EU definition of the daily 8-hour window starting from 17 h of the previous day.
When the aggregation period is "seasonal", "summer", or "annual", the 26th highest daily 8-hour maximum of the aggregation period will be computed.
dma8eu_strict As dma8eu, but additionally, a diurnal 8-hour maximum value is only saved if at least 18 out of the 24 8-hour averages are valid. This is the official dma8eu definition.
drmdmax1h Maximum of the 3-months running mean of daily maximum 1-hour mixing ratios during the aggregation period.
m7_avg Daytime mean values (9-16h).
max Maximum in the aggregation period.
max1h Daily maximum hourly value.
mean Average value in the aggregation period.
median Median value in the aggregation period.
min Minimum in the aggregation period.
nighttime_avg Same as daytime_average but accumulated over the daily interval from 20:00 h to 07:59 h.
nvgt050 Number of days with exceedance of the dma8epax value above 50 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgt060 Number of days with exceedance of the dma8epax value above 60 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgt070 Number of days with exceedance of the dma8epax value above 70 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgt080 Number of days with exceedance of the dma8epax value above 80 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgt090 Number of days with exceedance of the daily max1h_values above 90 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgt100 Number of days with exceedance of the daily max1h_values above 100 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgt120 Number of days with exceedance of the daily max1h_values above 120 ppb. The value is marked as missing if less than 75% of days contain valid data.
nvgtall nvgt050+nvgt060+nvgt080+nvgt090+nvgt100+nvgt120.
p05 Fifth-percentile of hourly values in the aggregation period.
p10 As p05, but for the 10th-percentile.
p25 As p05, but for the 25th-percentile.
p75 As p05, but for the 75th-percentile.
p90 As p05, but for the 90th-percentile.
p95 As p05, but for the 25th-percentile.
p98 As p05, but for the 98th-percentile.
p99 As p05, but for the 99th-percentile.
percentiles1 p25+p50+p75.
percentiles2 p5+p10+p25+p50+p75+p90+p95(+p98+p99 if aggregation period is "summer" or "annual").
somo10 Sum of excess of daily maximum 8-h means (EU Airbase standard with relaxed criterion: dma8eu) over the cut-off of 10 ppb, i.e. 20 µg/m3 calculated for all days in the aggregation period. SOMO10 will be set to missing if less than 75% of days are available. The quantity will be weighted by the number of theoretical days over the number of available days.
somo10_strict As somo10, but using dma8eu_strict for data capture.
somo35 As somo10, but accumulating ozone values above 35 ppb.
somo35_strict As somo10_strict, but accumulating ozone values above 35 ppb.
stddev Standard deviation in the aggregation period.
w126 Daily W126 index is accumulated using hourly values for the 12-h period from 08:00h until 19:59h. W126 = SUM(wi*Ci) with weight wi = 1/[1 + M*exp(-A*Ci/1000)], where M = 4403, A = 126, and where Ci is the hourly average O3 mixing ratio in units of ppb. If there are less than 9 valid hourly values in the 12 hour window, the daily value is considered missing. When there exist 75% or greater data capture in the daily 12-h window, the scaling by fractional data capture (ntotal/nvalid) is utilized.
Seasonal, summer, or annual statistics are calculated as sum over the daily W126 values. Results are marked as missing if less than 75% of daily values are valid.
w126_24h As w126, but using all 24 hours of a day.
w90 Daily maximum W90 5-h Experimental Exposure Index:
EI = SUM(wi*Ci) with weight wi = 1/[1 + M*exp(-A·Ci/1000)], where M = 1400, A = 90, and where Ci is the hourly average O3 mixing ratio in units of ppb (Lefohn et al., 2010). For each day, 24 W90 indices are computed as 5-hour sums, requiring that at least 4 of the 5 hours are valid data (75%). If a sample consists of only 4 data points, a fifth value shall be constructed from averaging the 4 valid mixing ratios.
For aggregation periods "month", "season", "summer", or "annual", the 4th highest W90 value is computed, but only if at least 75% of days in this period have valid W90 values.

4. Aggregated Output Formats

Following are the descriptions of available output formats for aggregated time series data. If you want to try different output formats to find the best one for your needs you can run your query with a low limit (e.g. limit=3) to check out the different outputs.

4.1 Raw

This output format will create csv files in the same way as TOARDB REST interface - 2.7 Data. The zip archive will contain one csv file per time series. The name of each individual csv file will be "<time_series_id>.csv"

4.2 By_statistic

This output format will create one csv file per requested statistic and one additional csv file with all the metadata. Each row in all the files will contain the information (either metadata or aggregated values) for one time series. All files have the same number and order of rows so that you can match the metadata and different aggregates for each time series via the row position. The metadata file is called "metadata.csv" and the files for the aggregated values are called "<statistic>.csv".

4.3 Json_simple

This output format will create one JSON file per time series, statistic and quantile (if using quantile regression). Each JSON file will contain a dictionary with one key which holds all the metadata and a second key holding the calculated trend, uncertainty and p-value. The files are called "<time_series_id>_<statistic>_<quantile>.json".

4.4 By_stat_quant

This output format will create one csv file per requested statistic and quantile and one additional csv file with all the metadata. Each row in all the files will contain the information (either metadata or trend values) for one time series. All files have the same number and order of rows so that you can match the metadata and different trends for each time series via the row position. The metadata file is called "metadata.csv" and the files for the trend values are called "<statistic>_<quantile>.csv".