CHARMe: Could it make our job easier?

ceda_logoAt CEDA, the Centre for Environmental Data Archival, we operate data archives and access services for the environmental research community, particularly for atmospheric and Earth Observation (EO) science. Part of this work includes the helpdesk, where our team handle a wide range of queries from our users; in the last quarter, we received almost 400 queries.

The questions we’re asked range from technical matters of broken links, lost passwords to rather in-depth and complex queries about the nature of the datasets we hold. Given that we hold over 300 different datasets including observations from space, aircraft, in situ , models and analyses in a variety of formats and structures, it’s not surprising that some of our users are not sure where to start.

I wondered whether the CHARMe system could be useful for our helpdesk staff, as well as our users and data providers, so I have just had a chat with Alison Waterfall, our EO specialist on the helpdesk. Alison estimates we get a couple of queries per week asking which data to use for a particular application, and a similar number asking about issues users have found in the data they’re using. Most of these questions relate to our meteorology and climate in situ observation datasets and it seems that CHARMe might be able to provide answers, if the data producers and helpdesk staff themselves upload relevant comments to the system. In our data catalogue we are lacking any real information about the datasets’ quality, so CHARMe will also help us fill that gap.

A couple more real-life examples:

  • A data provider reported an important caveat about the data in a certain period, after it was published from our archives. The warning was added to our catalogue records, so it’s there now for new users to see, but how many users check the catalogue in detail before downloading the data? And probably even fewer check back after they’ve downloaded what they need. If CHARMe allows users to subscribe to relevant updates it would a) allow the data producer themselves to put in the health warnings where necessary – saves work for our staff and b) allow users to be notified that new information is available [I don’t know if the subscription capability is planned in the first two years? If not perhaps in future!]
  • Helpful Users (HU) sometimes write in to the helpdesk offering us the code they have written to read particular formats. Puzzled Users (PU) write in to the helpdesk asking how to read particular formats. If HU are able to put CHARMe comments into the system with tips and links to their handy code, PU will readily find the tools they need to get started with their netCDF, or ascii data. Similary the helpdesk team themselves could annotate the frequently queried datasets with links to the tools we already make available via the data centres’ websites.

So, in summary, my rather incomplete survey, of just one of the helpdesk team, indicates that yes, it will be a pretty useful addition to our current systems and would give us another way to communicate with, and assist our users.

deskchair_graphichelpdesk_graphic

Getting specific: “fine-grained commentary” in the CHARMe project

In the CHARMe project, we are aiming to improve the amount and quality of information that can be discovered about climate data, to help users decide whether a dataset will meet their needs. There has been a great deal of work done on helping data providers to describe their datasets better, but CHARMe is focusing on a different dimension to the problem – how can users find out knowledge and opinions from other users? We call this user-supplied information “commentary metadata”, or simply “commentary”, and it may be scattered across multiple locations, including databases of publications, technical reports, websites and blogs.

In order to link this information, we are harnessing the power of the Semantic Web and Linked Data, which enables us to publish commentary metadata widely in a way that can be interpreted both by humans and by automated software. Fortunately, there is a very useful overarching specification that we can adopt to structure and publish our data – the Open Annotation Data Model from the World Wide Web Consortium. This enables us to model items of commentary as “annotations”, which simply attach new information (the piece of commentary, or body) to an existing resources (targets), such as climate datasets. We can annotate anything that has a unique identifier (for example, a Digital Object Identifer (DOI) or persistent URL):

Basic Open Annotation model

Basic Open Annotation data model

However, what if we want to say something about a specific part of a dataset? For example, we might want to highlight an interesting feature, such as a dust storm or volcanic ash cloud, or flag up a potential problem with a processing algorithm or sensor, which may affect all data in a certain geographic region. In the CHARMe project we call this fine-grained commentary.

These subsets of datasets will not, in general, have their own unique and persistent identifiers that are known in advance – there are potentially infinitely many of them! Once again, however, Open Annotation can help us out. The OA model defines the concept of a Specific Resource, which is defined as a subset of an existing resource. The definition of the subset itself is given by a Selector. The figure below (taken from the OA documentation) illustrates this:

Specific Resources and Selectors in Open Annotation

Use of Open Annotation Specific Resources and Selectors to annotate subsets of a dataset

Open Annotation defines Selectors for common use cases, such as defining a section of text in a book, or a region of an image. Our job in CHARMe is to define a special kind of Selector that can unambiguously define a specific region of a climate dataset. Broadly speaking, this will include:

  1. the variable(s) that are selected in the subset (e.g. the commentary may only concern the temperature variable in a much larger multivariate dataset);
  2. the region in time and space to which the commentary pertains. This may be a four-dimensional regions, including constraints in the horizontal dimensions, the vertical dimension and in time.

Sounds simple? Unfortunately there are many potential snags, which we are gradually working through, including:

  1. Regions in space can be identified by geographic coordinates (e.g. as polygons) but this may not be very convenient for users, who may prefer to use names (e.g. “South Atlantic”, “Europe”). The same applies for the vertical and time dimensions, in which we could employ coordinates or named regions (“stratosphere”, “boundary layer”, etc.)
  2. Having stored this geographic information, how can it be searched efficiently? The integration of GIS techniques and Linked Data is a hot topic of recent discussion and we are participating actively, whilst experimenting with technologies such as Jena Spatial and Strabon.
  3. Climate data has many peculiarities; for example, climate models often employ idiosyncratic calendar systems, meaning that we can’t automatically adopt existing standards for recording information such as time.

Here is a mockup of our “CHARMe Maps” demonstration tool, which will show the concept of “fine-grained commentary” in action:

Mock-up of the CHARMe Maps demonstrator tool

Mock-up of the CHARMe Maps demonstrator tool

We will be developing this tool over the coming months (building on previous work in projects such as BlogMyData) and working on the above issues. Watch this space for our progress and please contact us if you’d like to discuss further.

Implementation of CHARMe in the European Climate Assessment and Dataset ECA&D

By Ge Verver @KNMI

Past weather observations are at the core of climate research. A statistical analysis of recorded measurements provides evidence of past and current trends in temperature, precipitation, wind, etc.. Changing weather patterns and frequency of occurrence of extremes require careful analyses of long time series of observations. And in case we want to say sensible things about our future climate, the necessary tools that are used for this, climate models, need to be ‘sharpened’ using observations from the past. It is essential to have a good understanding of the quality of the measurements used for these purposes. After all, the value of a statement on climate change is as good as the quality of the data set on which this statement is based.

The European Climate Assessment and Dataset (www.ecad.eu) contains of a large (and growing) number of observations from weather stations, covering over 60 years of historical measurements over Europe. Most of these data are directly available for non-commercial users. However the real value of these data are unlocked in the time series of user-driven climate indices, and in the maps showing the daily values in a grid over Europe including the uncertainties involved.

Almost 40 thousand records are stored in the ECA&D database, and a lot of meta-information describing for instance the stations at which the measurements were made, the definition of the indices, the gridding methods, references, data policies, etcetera. The CHARMe projects adds to this information commentary metadata: valuable comments that guide users to properly select and use the data, by pointing at deficiencies and sharing best practices. This information can be provided by both users and providers of the data. A fully functional CHARMe system, integrated in ECA&D, would turn ECA&D into a kind of ‘marketplace’ for climatological data, in some aspects similar to an internet shop with customer reviews and product rankings. The community of ECA&D users, and those using the gridded dataset E-OBS, are large enough to expect some interesting interaction of users and feedback on quality issues. For ECA&D, CHARMe may be the tool to ‘harvest’ user comments pinpointing any quality issues with ECA&D.

The idea of stronger feedback from users looked tempting on paper, but the question is whether it would work in the operational environment of ECA&D? The real-live implementation serves as a reality check of the ‘paper’ ideas. As a consequence the functionality implemented during the two years of the project will be limited and only for a part of the database. Perhaps the most important results are the lessons learned that are necessary for further development in a next phase.

Nevertheless, by the end of the project users of the ECA&D website may find a ‘CHARMe-button’ next to the link to the gridded data, providing access to additional metadata relevant for that particular dataset. Behind this small button, that at first may not look like much, is the start of a new functionality serving the rapidly growing ‘market’ of climate data.

 

CHARMe as a bibliographic database

In the last post we read about CHARMe as an online collaboration tool, today I want to show you another application area of CHARMe. In the context of the CM SAF at DWD, the German weather service, we are thinking of transferring our dataset related internal bibliographic database to CHARMe. In doing so, we hope to achieve a tighter linking of the gathered primary and secondary literature of a referenced dataset to its primary entity and to place that information at a more prominent location than the local intranet. At the end of incorporating this bibliographic database into CHARMe the CM SAF  users should have access to this information directly next to the dataset itself. But what are the steps of migration we have to take in order to achieve this aim? I won’t discuss those steps in detail but want to focus only on one major task: the classification of information.

As you already know CHARMe is following the principles of the Linked Data approach. One building block of this approach is applying a vocabulary, or more precisely an ontology, to the things we want to talk about. By choosing a common vocabulary or a well-known and well defined ontology, we can express our available bibliographic data in a semantic web fashioned way and in the same step they become more or less CHARMe enabled. And surprise, one of the major ontologies that’s driving CHARMe is FaBiO, the FRBR (Functional Requirements for Bibliographic Records)-aligned Bibliographic Ontology. The FaBiO gives us well-explained, hierarchal ordered terms like technical report, conference paper, web content or even blog and blog post among many others. By exploiting FaBiO we can classify our already gathered bibliographic information and make them understandable for humans and machines like CHARMe. Of cause there are some more steps to take to CHARMe-enable our internal bibliographic database but those might be covered in another post.

The message-triples of this post are:

CHARMe might be used as an online collaboration tool.

CHARMe might be used as a bibliographic database.

CHARMe uses FaBiO.

With these three pieces of information, especially the last one, a non-CHARMe machine/application got some very useful hints about what and how to query and/or link to our CHARMe-enabled bibliographic database.

Welcome to the world of the semantic web and linked data.

CHARMe – beyond climate data

The strap-line for the CHARMe project is “sharing knowledge about climate data” however the linked data model developed in the project is generic and has potential uses well beyond the climate domain. I want to look at just such a situation from my perspective working for Airbus Defence and Space on the application of Earth Observation (EO) data for environmental monitoring.

To an extent, environmental monitoring is inextricably bound to the climate sciences as it both encompasses climate observations themselves and many other aspects of the environment are also affected by the climate and how it is changing. So we have reached one important conclusion already – the issues we deal with in the environmental and natural sciences cross disciplines need to be approached in ways that link disparate data, lowering the barriers for information discovery and transfer between different communities and means of communication.

But beyond this general point, let’s look at CHARMe’s utility in the context of the launch of a new EO satellite. With the advent of any new EO mission there is an intense period of in-orbit system checks, analysis, calibration, error characterisation and algorithm development. This activity cascades down from well defined systems checklists by engineers to broader disciplines of EO specialists characterising sensor performance, testing and fine tuning the basic radiance products, developing, calibrating and validating atmospheric correction algorithms and geophysical parameter retrievals. Once the higher level products are available, an even broader range of scientists and application developers come into play to utilise the observations and derived geophysical parameters for specialist research and operational services.

As a concrete example, consider the Sentinel-2 mission. A core part of the space component of the European Union’s Copernicus programme for global environmental monitoring, the initial two Sentinel-2 satellites will boast super-spectral instruments capable of observing the Earth’s surface at 10 – 20m resolutions, every 5 days. Even as the first satellite was undergoing systems integration at Airbus’s Friedrichshafen facility, EO specialists and domain experts gathered at a Sentinel 2 workshop last month to discuss instrument performance, cross-calibration with other missions, data product development and a range of applications from land cover, phenology, agriculture and forestry to inland water quality, coastal processes and glaciology.

The intense preparations, both pre and post launch, is generating a wide range of analysis and development, much of it third party, carried out across the user community to generate commentary on the mission and its derived datasets. This commentary has a high currency and changes rapidly whilst the traditional peer reviewed literature can be months out of date spread out and sometimes difficult to find. And although much information is, in principle, accessible on the Web it is distributed across technical documents, blogs, conference proceedings, on-line results and data. This reduces the efficiency of both knowledge transfer and the ability of different users (EO specialists, domain researchers and application developers) to judge if data are ‘fit-for-purpose’.


ImageSimple illustration of some of the interrelationships linking a satellite mission to the on-board instrument, data, technical articles and researchers / developers


This is where I see the CHARMe linked-data model and system having a significant role to play. User communities need a system to facilitate search through, and add to, the commentary metadata building up around missions such as Sentinel-2. Moving forward, the CHARMe project needs to engage with mission planners such as the European Commission, ESA, NASA and NOAA, to deploy CHARMe-like systems for upcoming satellite missions to:

  • provide linkage across communities and information outlets;
  • facilitate the discovery of up-to-date commentary information;
  • enable the submission and distribution of new commentary information in a timely manner.

Thomas Lankester, Airbus Defence and Space.

EGU meeting in Vienna

The CHARMe project presented at the EGU meeting in Vienna on 28 March – 2 May 2014.

Raquel Alegre delivered a talk on “Contextualizing the Visualization of Climate Data: The CHARMe Project” where the tools developed by the CHARMe consortium were described, and Iryna Rozum presented a poster entitled “Studying variability in climate data: a significant event viewer tool”. Both presentations received lots of questions and initiated discussions on the use of metadata and data models used in climate physics applications, and on the features these tools will provide.

Image
Questions in Raquel’s talk were “What are significant events? Do you use the Virtual Observatory (VO) event standard?”. the answer was “Significant events are not annotations. Depending on what angle we look at them from they can be either a target or a body within the Open annotation data model. Currently we do not use a VO event or other standards for significant events but we will look into the VO standards and if this concept is applicable then we will use it for our events.”

Image
Questions during Iryna’s poster presentation were “When can we try your significant event viewer tool?” To which the reply was “We are hoping to put first working version online this summer, hopefully in June”. Another question was “Can you add more filtering to significant events?” And the answer was “We can look into this but this can be impractical as different categories of events will require different filters. We try to keep a selector form simple and user friendly and adding different filters will add extra complexity. The concept is that the user should decide which events are important for particular features on the plot or for a particular climate product.”

These discussions at EGU were very useful in a sense that they provided an insight into what potential CHARMe users want and also gave us extra ideas, for example the use of VO concepts for significant events, that we are looking into whether they are applicable to CHARMe.

One aspect was clear: scientists are looking forward to using CHARMe tools!

Building Stategic links with a visit to the US

One of the goals of the CHARMe project is to broaden research partnerships by building strategic links with other parties outside Europe. Since the beginning of the project, the CHARMe project found the US National Oceanographic and Atmospheric Administration (NOAA), and the US National Climate Predictions & Projections (NCPP) Platform aligned well with our own project aim: the NCPP’s mission is to “support the collaborative development of climate information for adaptation by a community of climate scientists.” They aim to do this by developing their own climate data products and provide tools and metadata to assist users in their decision making. We have had several teleconferences during this last year about how CHARMe can be integrated with their efforts and to keep each other up to date on progress so far.

raquel

Raquel Alegre (University of Reading) was lucky enough to be offered the opportunity to travel to the US after learning about a few events relevant to the project located near NOAA’s premises in Boulder (Colorado). There was a workshop by W3C on Open Annotation (the standard chosen to represent CHARMe’s data model), a conference on state-of-the-art use of annotations, and a 48-hour hackathon on annotation systems with the participants of the workshop and the conference. This gave a great opportunity to present the current status of CHARMe and future development ideas to the authors of the W3C Open Annotation (OA) data model. The CHARMe project is developing solutions for the annotation of temporal and fine-grained data, which has not been tackled by OA as of yet.

conf

After the conferences, Raquel attended a two day meeting with the NCPP group. On the first day she had the pleasure of putting familiar names to faces from the previous teleconferences. They were extremely enthusiastic and very interested in installing the CHARMe Plugin in their climate data service site, which is part of the CoG system, and gave very valuable suggestions for the advanced tools.

conf1

All in all it was a great trip and an excellent opportunity to share knowledge and build partnerships with parties across the pond in the US.