At CEDA (Centre for Environmental Data Archival) we run four data centres: the British Atmospheric Data Centre (BADC), the NERC Earth Observation Data Centre (NEODC), the IPCC Data Distribution Centre (IPCC DDC) and the UK Solar System Data Centre (uKSSDC). Our data scientists work with data producers in atmospheric and climate science to ensure that all our data holdings are made available with metadata that are as accurate and complete as possible. This is done by adhering to accepted file formats and metadata conventions such as CF-netCDF for large datasets, e.g. climate model output, and BADC-CSV for smaller amounts of data, e.g., single site observations of a small number of atmospheric parameters. So what will CHARMe add to all this existing metadata?
The key point to bear in mind is that all the ‘discovery’ metadata within the CEDA dataset catalogue and all the ‘usage’ metadata stored inside the individual data files are written at the time the data themselves are produced and published. As such, they contain the information that the data producers have available at the time of writing, such as the instrument or model used to obtain the data, the measurement technique, and the geospatial and temporal extent of the data. Of course these are all necessary and important pieces of information. However, it is almost impossible to provide all the information, ahead of time, that might be needed by all potential users of a given dataset, for example, a statement of its suitability for a particular scientific purpose or details of how it compares to older datasets. It is the provision of this additional information, which may become available long after the dataset was originally created, which shows the great potential of the CHARMe system.
Let’s take an example: suppose a scientist wishes to compare some global atmospheric model data with satellite observations of Earth’s radiation budget. In the process it is discovered that a systematic bias exists in the satellite data in regions of deep tropical convection (something very similar to this happened during a research project I took part in a number of years ago). Clearly, other users of the same data set would benefit from knowing about the bias in the satellite observations. The result may be published in a peer reviewed journal, presented at a conference, or may simply be recorded in the notes of the scientist. CHARMe provides a way of linking external sources, such as the DOI or URL of a published paper, or adding a simple text annotation to a data centre’s catalogue entry for the original data set. Annotations can be added quickly and easily using the CHARMe plugin which is accessed by clicking the appropriate icon next to the data set name.
Furthermore, CHARMe allows multiple annotations of a dataset and even allows the annotations themselves to be annotated, thus providing a mechanism for drawing together the collective experience of all the data users. This will then allow subsequent users to benefit directly from the accrued knowledge, for example, they will be able to gauge the dataset’s suitability for their own work and correctly interpret the data values.
By Ge Verver @KNMI
Past weather observations are at the core of climate research. A statistical analysis of recorded measurements provides evidence of past and current trends in temperature, precipitation, wind, etc.. Changing weather patterns and frequency of occurrence of extremes require careful analyses of long time series of observations. And in case we want to say sensible things about our future climate, the necessary tools that are used for this, climate models, need to be ‘sharpened’ using observations from the past. It is essential to have a good understanding of the quality of the measurements used for these purposes. After all, the value of a statement on climate change is as good as the quality of the data set on which this statement is based.
The European Climate Assessment and Dataset (www.ecad.eu) contains of a large (and growing) number of observations from weather stations, covering over 60 years of historical measurements over Europe. Most of these data are directly available for non-commercial users. However the real value of these data are unlocked in the time series of user-driven climate indices, and in the maps showing the daily values in a grid over Europe including the uncertainties involved.
Almost 40 thousand records are stored in the ECA&D database, and a lot of meta-information describing for instance the stations at which the measurements were made, the definition of the indices, the gridding methods, references, data policies, etcetera. The CHARMe projects adds to this information commentary metadata: valuable comments that guide users to properly select and use the data, by pointing at deficiencies and sharing best practices. This information can be provided by both users and providers of the data. A fully functional CHARMe system, integrated in ECA&D, would turn ECA&D into a kind of ‘marketplace’ for climatological data, in some aspects similar to an internet shop with customer reviews and product rankings. The community of ECA&D users, and those using the gridded dataset E-OBS, are large enough to expect some interesting interaction of users and feedback on quality issues. For ECA&D, CHARMe may be the tool to ‘harvest’ user comments pinpointing any quality issues with ECA&D.
The idea of stronger feedback from users looked tempting on paper, but the question is whether it would work in the operational environment of ECA&D? The real-live implementation serves as a reality check of the ‘paper’ ideas. As a consequence the functionality implemented during the two years of the project will be limited and only for a part of the database. Perhaps the most important results are the lessons learned that are necessary for further development in a next phase.
Nevertheless, by the end of the project users of the ECA&D website may find a ‘CHARMe-button’ next to the link to the gridded data, providing access to additional metadata relevant for that particular dataset. Behind this small button, that at first may not look like much, is the start of a new functionality serving the rapidly growing ‘market’ of climate data.