How will CHARMe benefit users of atmospheric and climate data? by Alison Pamment, STFC
At CEDA (Centre for Environmental Data Archival) we run four data centres: the British Atmospheric Data Centre (BADC), the NERC Earth Observation Data Centre (NEODC), the IPCC Data Distribution Centre (IPCC DDC) and the UK Solar System Data Centre (uKSSDC). Our data scientists work with data producers in atmospheric and climate science to ensure that all our data holdings are made available with metadata that are as accurate and complete as possible. This is done by adhering to accepted file formats and metadata conventions such as CF-netCDF for large datasets, e.g. climate model output, and BADC-CSV for smaller amounts of data, e.g., single site observations of a small number of atmospheric parameters. So what will CHARMe add to all this existing metadata?
The key point to bear in mind is that all the ‘discovery’ metadata within the CEDA dataset catalogue and all the ‘usage’ metadata stored inside the individual data files are written at the time the data themselves are produced and published. As such, they contain the information that the data producers have available at the time of writing, such as the instrument or model used to obtain the data, the measurement technique, and the geospatial and temporal extent of the data. Of course these are all necessary and important pieces of information. However, it is almost impossible to provide all the information, ahead of time, that might be needed by all potential users of a given dataset, for example, a statement of its suitability for a particular scientific purpose or details of how it compares to older datasets. It is the provision of this additional information, which may become available long after the dataset was originally created, which shows the great potential of the CHARMe system.
Let’s take an example: suppose a scientist wishes to compare some global atmospheric model data with satellite observations of Earth’s radiation budget. In the process it is discovered that a systematic bias exists in the satellite data in regions of deep tropical convection (something very similar to this happened during a research project I took part in a number of years ago). Clearly, other users of the same data set would benefit from knowing about the bias in the satellite observations. The result may be published in a peer reviewed journal, presented at a conference, or may simply be recorded in the notes of the scientist. CHARMe provides a way of linking external sources, such as the DOI or URL of a published paper, or adding a simple text annotation to a data centre’s catalogue entry for the original data set. Annotations can be added quickly and easily using the CHARMe plugin which is accessed by clicking the appropriate icon next to the data set name.
Furthermore, CHARMe allows multiple annotations of a dataset and even allows the annotations themselves to be annotated, thus providing a mechanism for drawing together the collective experience of all the data users. This will then allow subsequent users to benefit directly from the accrued knowledge, for example, they will be able to gauge the dataset’s suitability for their own work and correctly interpret the data values.