How will CHARMe benefit users of atmospheric and climate data? by Alison Pamment, STFC

CEDAAt CEDA (Centre for Environmental Data Archival) we run four data centres: the British Atmospheric Data Centre (BADC), the NERC Earth Observation Data Centre (NEODC), the IPCC Data Distribution Centre (IPCC DDC) and the UK Solar System Data Centre (uKSSDC). Our data scientists work with data producers in atmospheric and climate science to ensure that all our data holdings are made available with metadata that are as accurate and complete as possible. This is done by adhering to accepted file formats and metadata conventions such as CF-netCDF for large datasets, e.g. climate model output, and BADC-CSV for smaller amounts of data, e.g., single site observations of a small number of atmospheric parameters. So what will CHARMe add to all this existing metadata?

The key point to bear in mind is that all the ‘discovery’ metadata within the CEDA dataset catalogue and all the ‘usage’ metadata stored inside the individual data files are written at the time the data themselves are produced and published. As such, they contain the information that the data producers have available at the time of writing, such as the instrument or model used to obtain the data, the measurement technique, and the geospatial and temporal extent of the data. Of course these are all necessary and important pieces of information. However, it is almost impossible to provide all the information, ahead of time, that might be needed by all potential users of a given dataset, for example, a statement of its suitability for a particular scientific purpose or details of how it compares to older datasets. It is the provision of this additional information, which may become available long after the dataset was originally created, which shows the great potential of the CHARMe system.

Let’s take an example: suppose a scientist wishes to compare some global atmospheric model data with satellite observations of Earth’s radiation budget. In the process it is discovered that a systematic bias exists in the satellite data in regions of deep tropical convection (something very similar to this happened during a research project I took part in a number of years ago). Clearly, other users of the same data set would benefit from knowing about the bias in the satellite observations. The result may be published in a peer reviewed journal, presented at a conference, or may simply be recorded in the notes of the scientist. CHARMe provides a way of linking external sources, such as the DOI or URL of a published paper, or adding a simple text annotation to a data centre’s catalogue entry for the original data set. Annotations can be added quickly and easily using the CHARMe plugin which is accessed by clicking the appropriate icon next to the data set name.

Furthermore, CHARMe allows multiple annotations of a dataset and even allows the annotations themselves to be annotated, thus providing a mechanism for drawing together the collective experience of all the data users. This will then allow subsequent users to benefit directly from the accrued knowledge, for example, they will be able to gauge the dataset’s suitability for their own work and correctly interpret the data values.

The CHARMe project in the Hellenic Congress of Rural & Surveying Engineering

The 4th Hellenic Congress of Geoinformatics & Rural-Surveying Engineering took place in Thessaloniki from 26 to 28 of September 2014. The conference covered the fields of R&D in Geoinformatics, Mapping & Urban Development and Environment & Climate Change.

 

dpe

 

During the conference, Terra Spatium SA presented the CHARME project to the participants, most of which are potential users. The presentation was focused on the importance of using proper climate data, being certain that the dataset used is the most suitable one for the specific task, if it has been validated or even if it is influenced by external factors such as a malfunction in the measuring instrument or an extreme weather phenomenon.
Specifically, in the conference there were key representatives from Ministry of the Environment, Energy and Climate Change, end-users where CHARMe concept mostly addresses to. Meantime, there was presence of the Greek authority responsible for civil protection under the umbrella of the Region of Central Macedonia. This specific region of Greece faces every day potential threats due to extreme weather phenomena (that may have occurred because of climate change effects). Therefore their interest was arisen because of the nature of the CHARME concept that is strongly related with their areas of concern.
Furthermore, there where participants from Greek public authorities such as National Cadastre & Mapping Agency S.A. (www.ktimatologio.gr) and the Greek Payment Authority of Common Agricultural Policy (C.A.P.) Aid Schemes (www.opekepe.gr). Matters of climate change data and metadata are of high value to them, especially under the frame of INSPIRE directive, the ambition of which they want to achieve in order to be fully compliant with the EU directive.
The positive feedback was identified during the aftermath discussions in the exhibition area where the CHARMe brochures were distributed, mostly from Research Institutes and Companies as they wanted to learn more details for CHARMe tools. They found really interesting that someone can search for metadata via a site that has installed the CHARMe plug-in, while using the interactive CHARMe button which will be presented in association with each search result. Thus, it is scheduled a presentation point to point regarding the user’s guidelines after the project finalization.
This positive approach of stakeholders from different sectors in a national level shows the dynamic of CHARMe concept and the utility to be developed this project in order to help users to find right data more efficiently.

Integration of CHARME with the Copernicus CQC Service

Introduction

Airbus Defence and Space operates the Copernicus Coordinated Quality Control (CQC) service (http://gmesdata.esa.int/web/gsc/CQC) on behalf of the European Space Agency (ESA). This is in support of the space component of the European Union (EU) Copernicus programme, which provides the Copernicus Service Projects (CSPs) with access to datasets from a range of contributing satellite Earth Observation missions.

The Copernicus CQC service is concerned with the monitoring and storage of all quality reports related to the missions, datasets and products involved in the Copernicus programme.

The provision of EO data to the Copernicus services by the Copernicus Space Component Data Access (CSCDA), now in its third phase of operations, is via a set of datasets defined in a Data Access Portfolio (DAP). The assessment of the quality of the DAP datasets and of the contributing missions is the responsibility of the Copernicus Coordinated Quality Control (CQC) service.

CQC Service Tasks

While the data quality of each delivered dataset remains the responsibility of the contributing mission, the CQC role is to perform further independent quality analysis, respond to and coordinate anomaly investigations and provide harmonisation and traceability of the quality information.

The service provided by CQC is divided into 10 separate tasks. Those related to the core quality control work are highlighted in Figure 1 in terms of two viewpoints:

Figure 1 - CQC Service Tasks

Figure 1: CQC Service Tasks

The analysis of representative data products (Task 3), handling of user feedback (Task 5) and generation of quality control synthesis reports when a dataset is closed (Task 6) are all related to Copernicus Datasets. However, the support of Copernicus Contributing Mission (CCM) integration (Task 9), CCM harmonisation (Task 7), sample data analysis (Task 4) and user feedback (Task 5) all provide information relevant to the CCM perspective.

A series of Python tools have been developed to support the automation of the image quality checks, which produce structured information that is stored and then viewed in an Oracle database.

However, the image quality checking process also yields unstructured, ad-hoc information, which might reveal patterns and trends that need to be addressed. This CQC commentary metadata can now be processed using the Open Annotations linked data model developed in the FP7 CHARMe project. To this end a new dedicated CHARMe server has been setup and configured for the CQC service.

CHARMe Adaptions for CQC

For deployment in the CQC, the standard Web page plug-in has been replaced with a custom client, written in Python. The basic Open Annotation model has also been extended to support multiple targets (see Figure 2). Specifically, each commentary body is annotated to two targets:

  • the associated Data Access Portfolio dataset – to support the generation of synthesis reports when a Dataset is closed;
  • the source Copernicus Contributing Mission – to facilitate the collation of quality issues and detection of patterns.
Figure 2 - Multiple Target Annotation Model for CQC Commentary

Figure 2: Multiple Target Annotation Model for CQC Commentary

For the CQC, two virtual machines have been configured: one hosts a deployment of the CQC CHARMe node and the other hosts a Web server. Figure 3 illustrates the organisation of the Web server which provides the URLs for both the annotation targets and bodies:

Figure 3 - CQC CHARMe Web Server

Figure 3: CQC CHARMe Web Server

The index for the Web pages links to two separate webpages, one hosting external web links to Copernicus Contributing Mission websites and the other links to internally held Copernicus Dataset definition pages. Both of these webpages also embed the CHARMe JavaScript plugin. The Web server additionally serves the CHARMe plugin itself and the CQC image analysis reports containing the ad hoc commentary metadata about detected anomalies. These reports, along with the URL links to background documents and user-complaint investigations, provide the bodies of the CQC commentary annotations.

At present the CQC CHARMe node operates on an internal network within Airbus DS. However, as the commentary metadata is added the information resource should become increasingly useful to both data providers and CCMEs. Therefore, a future evolution might be to allow external access to the CQC CHARMe node.

Climate Symposium in Darmstadt

The CHARMe project has been presented at the Climate Symposium in Darmstadt on 13th – 17th of October 2014. climate_symposium

The CHARMe-Project stand at the Climate Symposium was neighbored by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF) and the ESA . The location fitted very well to attract the members of the symposium during the coffee- and lunch breaks throughout the whole week. The CHARMe-projectteam has been represented by Rhona Phipps (UoR), Alan O’Neill (UoR), Phil Harwood (CGI) and Frank Kratzenstein (DWD). With the demo-version of the CHARMe-plugin at hand, a lot of detailed and interesting discussions on the use of  commentary metadata and data models used in climate physics applications were initiated. Among others, questions like “What requirements from a data-provider point of view do we have to meet in order to use the CHARMe Service?” or “How can I run my own CHARMe Node?” or “How will the development of CHARMe be maintained after the projects’ end?” came up during those vivid conversations.

To answer the last question first, the british SCIENCE AND TECHNOLOGY FACILITIES COUNCIL (STFC) has committed itself to administer and maintain the public CHARMe Node. In addition the CHARMe Node source codes are available under an open source license as do the sources of the javascript based CHARMe-plugin and the GoogleWebToolkit based CHARMe Maps tool. Beside the STFC one of the CHARMe consortium partners is going to run its own “private” CHARMe node and therefore the CHARMe-project will deliver an Installation and an Interface Control Document for the node. And to answer the first question, according to the project guidelines there will be a publicly available deliverable document with the title “Implementation in archives”. This document will address in detail the steps the dataproviders in the CHARMe consortium like  KNMI, DWD, AirBus have undertaken to “CHARMe-enable” their data archives. In addition to this “Best Practice”-document there is an installation guide for the CHARMe plugin. One of the major “soft” targets of the project has been, to make the adaption of CHARMe as easy as possible for dataproviders. Nevertheless, you have to run through the following major steps:

– registering at a CHARMe node and claiming for a CHARMe client id

– downloading, extracting and configuring the javascript CHARMe-plugin

– editing/adjusting the web page(s) to be CHARMe-enabled

– enjoying to annotate your datasets or to get your datasets annotated

charme_plugin_i5.1

One aspect was clear: scientists and dataprovider are looking forward to use CHARMe tools and are keen to see its launching.

CHARMe from a Hydrometeorologist’s POV

CHARMe from a Hydrometeorologist’s POV

CHARMe (CHARacterisation of Metadata) is an online system for collecting and sharing metadata and user feedback on climate and earth science datasets. It will help users of climate data judge how suitable the data are for their intended application and enable them to add commentary on specific benefits, features and limitations discovered when using the data.

The CHARMe project is now close to going live (www.charme.org.uk), and a beta version is planned for December. Data providers signed up so far include ECMWF, CGI, BADC and KNMI. In addition to metadata provided at source, CHARMe will enable users to share rich commentary data and user experience using an annotation user interface. Users annotation will be stored on an independent CHARMe server, and updated centrally so that users can always see most up to data information.

CHARMe will benefit working scientists and researchers. As a hydro-meteorologist, I’m interested in precipitation data from models, and observations including rain gauges, weather radar, and satellites. Modelled rainfall has known data quality issues, especially the underestimation of high intensity events and extremes. CHARMe will enable users to highlight spatial and temporal anomalies, missing data, or other unusual trends, biases, or discontinuities. It would be valuable if issues were highlighted in CHARMe by previous researchers saving time and resources spent reassessing the data.

Remote sensed satellite data are a valuable source of data for earth scientists working in data scarce regions. These data often require a degree of specialist knowledge to interpret, including knowledge of observation systems, data and processing. Datasets may include anomalous data, for example TRMM satellite precipitation maxima, and CHARMe will allow specialists and users to add commentary on issues found when applying data, saving time for users and avoiding duplication of errors. It could also help data providers prioritise development effort.

Similarly rainfall observations from weather radar are routinely reprocessed to account for ground clutter, bright band and other phenomena which affect data quality. And rain-gauge and other synoptic data may suffer from instrument error, missing data, or communications issues. CHARMe is a useful tool for data providers to flag up changes in processing algorithms or significant data outages etc.

Good quality commentary will document actual user experience and provide an empirical indication of data quality to counter anecdotal information and hearsay of uncertain quality. Importantly, a qualitative indicator of data quality and model uncertainty will complement quantitative estimates of uncertainty provided with the model data.

In summary, CHARMe will benefit a range of existing and future data users in climate and environmental science by adding confidence in the source, provenance, and quality of model and observations datasets. It will provide metadata and commentary which will and add value to research programmes and encourage intelligent applications. A CHARMe user forum and blog at charme.org.uk will enable users to share results and experience with a wider audience

CHARMe is being publicised at the 10th European Conference on Applied Climatology (ECAC), Prague 6-10th October, the Climate Symposium in Darmstadt, 13-17th October, the CCI Co-location meeting at ESRIN, Italy, 20-24th October, the ESA Big Data from Space conference, 12-14th November, and the AGU, San Francisco, 15-19th December.

CHARMe and the WMO Information System

The CHARMe project will provide ECA&D at KNMI the ability to handle commentary data.  This is good for ECA&D climate data, but ECA&D is only one regional climate centre among many centres worldwide.  How can CHARMe be made available to all these centres, hence achieving more impact?  Can the WMO Information System be used for this purpose?  Let’s have a closer look .

 In 2009, the World Climate Conference-3 decided to establish the Global Framework for Climate Services (GFCS) “to strengthen the production, availability, delivery and application of science-based climate prediction and services”. The GFCS is a major initiative for WMO and for which the Copernicus programme will be a fundamental European contribution. 

GFCS

The Climate Service Information System(CSIS) is the ‘operational centre’ of GFCS .The regional centres including ECA&D will form the backbone of the CSIS. It is the GFCS that promotes effective CSIS-wide use of WMO Information System (WIS). 

What is the WMO Information System (WIS)?  WIS is defined as the global infrastructure for telecommunications and data management functions, suitable for all WMO Programmes, for routine collection and automated dissemination of observed data and products, as well as data discovery, access and retrieval services for all data produced by centres and members within the framework of any WMO Programme.  An important aspect of WIS is that it is open to “other users” (public, institutes) outside the WMO community.

WIS

WIS distinguishes three types of centres forming the core infrastructure of WIS: Global Information System Centres (GISCs), Data Collection or Production Centres (DCPCs), and National Centres (NCs). GISCs offer connectivity within their area of responsibility using dedicated and public networks, and together form the “Core Network”. The DCPCs and NCs are the data centres with either national focus (NC) or  regional/global focus (DCPC).  A WMO Regional Climate Centre like ECA&D is typically a DCPC. Other examples of DCPCs are ECMWF and EUMETSAT. 

One required function for a DCPC is the ability “to describe products according to an agreed WMO standard and provide access to this catalogue of products and provide this information as appropriate to other centres, in particular a GISC”.  This concerns creation and discovery of metadata using  the WMO Core Metadata Profile of the ISO 19115 and a Catalogue Server. What if this (discovery) metadata could be extended by commentary metadata using CHARMe ?

There are basically two WIS compliant solutions offering DCPC functionality: OpenWIS® and Discover Weather. KNMI has selected OpenWIS®  because it is open source and the most popular solution.  If CHARMe (plug-in) could become part of this OpenWIS® package, substantial impact can be achieved. This requires support for CHARMe from the OpenWIS® Steering Committee (UK Met office, Meteo France, BOM Australia etc) . Perhaps a feasible opportunity to integrate CHARMe with WIS…..

Nevertheless, before integrating CHARMe with any WIS compliant solution, practical experience with systems like ECA&D is needed first.  

CHARMe as bibliographic database, part 2: SKOS

In one of our last posts I introduced the idea of using CHARMe as a bibliographic database. In this post I want to dive deeper into this topic, focusing on how to capture various specific  types of documents.
At first let’s have a look, how our local bibliographic database is currently organized and what queries we want to run against it. It turns out that our bibliographic database is a simple excel-sheet with some informal defined constraints restricting  the allowed values of some columns. The most interesting constraints for this blog, are defined on the columns DocType  and Type. The column ‘Type’ specifies the kind of DocType.document in more detail.

bibliograhic_db

A common query for the bibliographic database looks like :
‘Please give me all items related to project documentation of a specific project/dataset.’or ‘Please give me all Algorithm Theoratical Basis Documents of a specific project/dataset’.
In addition to these common queries a new user requirement is, to add a new level of categorization to the ‘Types‘, in order to enable queries like: ‘Please give me all technical  items related to project documentation of a specific project/dataset.’
That leads us to the questions what are technical items and what other kind of items are to be captured? As we think about it, we come to the conclusion that we need a hierarchal order of those items. Additionally  it would be a great deal to make  these classifications and definitions publicly available, at least to our department as a kind of department vocabulary  and not to hide them inside an application.

cmsaf_vocab

After we have captured these fundamental requirements let’s have a look how CHARMe may help us to get on the way.
In the CHARMe project we have defined to use the FRBR-aligned Bibliographic Ontology, fabio, a publicly available and widespread used ontology. Therefore we should watch out for some fabio-classes, which we can map to our DocType and Type entries. And indeed except for DocType.document we find the following mappings:

docType:article to :

fabio_article

docType:presentation to :

fabio_presentation

docType:poster to :

fabio_conferenceposter

The nearest match to docType:document might be:

fabio_report

And as we see the fabio:Report already has some sub-classes, but when we take a closer look into fabio:TechnicalReport we have to notice that it is leaf node in the fabio-hierachy. So we need to find a way to exend the fabio-ontology In order to cover our technical items, like the ATBD.
While finding an existing, public widespread ontology which defines the term ATBD for us would be the preferred option, in this blog we start to create our own definition/vocabulary and trying to extend fabio to our needs, by using SKOS.

SKOS, which stands for Simple Knowledge Organization System, is a W3C standard, based on other Semantic Web standards (RDF and OWL), that provides a way to represent controlled vocabularies, taxonomies and thesauri.

“The fundamental element of the SKOS vocabulary is the concept. Concepts are the units of thought —ideas, meanings, or (categories of) objects and events—which underlie many knowledge organization systems. As such, concepts exist in the mind as abstract entities which are independent of the terms used to label them.SKOS Primer

So we start by expressing (using Turtle) our ATBD as a SKOs Concept and adding some labeling information and the definition to it:

cmsaf_vocab:ATBD     rdf:type     skos:Concept;
cmsaf_vocab:ATBD     skos:prefLabel    “ATBD”;
cmsaf_vocab:ATBD     skos:altLabel     “Algorithm Theoretical Basis Document”;
cmsaf_vocab:ATBD     skos:definition     “The Algorithm Theoretical Basis Documents (ATBD) are intended to describe the physical and mathematical description of the algorithms to be used in the generation of data products. The ATBD include a description of variance and uncertainty estimates and considerations of calibration and validation, exception control, and diagnostics. In some cases, internal and external data product flows are required.”;

And to build the bridge to the fabio-Ontology we just need to add a semantic relation information between our ATBD-concept and a fabio-Class:

cmsaf_vocab:ATBD     skos:narrower     fabio:TechnicalReport .

That means by using SKOS we can build up our own classification scheme in a common data model for knowledge organization systems, with the benefits, to easily incorporate or extend an existing KOS.

fabio_extended

But by migrating from excel to CHARMe, we also switch from relational data to graph data and from SQL to SPARQL. That means, the question ‘Please give me all technical items related to project documention of a specific project.’ becomes in pseudo SPARQL :
Select *
where {
?technical_item oa:hasTarget <uri of our project> .
?technical_item rdf:typeOf ?technical_item_type .
?technical_item_type skos:narrower+ fabio:TechnicalReport .
}.

So there are some more steps adhead of us before we can migrate our EXCEL bibliographic database to  CHARMe, but by using fabio and SKOS one of the big chunks can be tackled.