In one of our last posts I introduced the idea of using CHARMe as a bibliographic database. In this post I want to dive deeper into this topic, focusing on how to capture various specific types of documents.
At first let’s have a look, how our local bibliographic database is currently organized and what queries we want to run against it. It turns out that our bibliographic database is a simple excel-sheet with some informal defined constraints restricting the allowed values of some columns. The most interesting constraints for this blog, are defined on the columns DocType and Type. The column ‘Type’ specifies the kind of DocType.document in more detail.
A common query for the bibliographic database looks like :
‘Please give me all items related to project documentation of a specific project/dataset.’or ‘Please give me all Algorithm Theoratical Basis Documents of a specific project/dataset’.
In addition to these common queries a new user requirement is, to add a new level of categorization to the ‘Types‘, in order to enable queries like: ‘Please give me all technical items related to project documentation of a specific project/dataset.’
That leads us to the questions what are technical items and what other kind of items are to be captured? As we think about it, we come to the conclusion that we need a hierarchal order of those items. Additionally it would be a great deal to make these classifications and definitions publicly available, at least to our department as a kind of department vocabulary and not to hide them inside an application.
After we have captured these fundamental requirements let’s have a look how CHARMe may help us to get on the way.
In the CHARMe project we have defined to use the FRBR-aligned Bibliographic Ontology, fabio, a publicly available and widespread used ontology. Therefore we should watch out for some fabio-classes, which we can map to our DocType and Type entries. And indeed except for DocType.document we find the following mappings:
docType:article to :
docType:presentation to :
docType:poster to :
The nearest match to docType:document might be:
And as we see the fabio:Report already has some sub-classes, but when we take a closer look into fabio:TechnicalReport we have to notice that it is leaf node in the fabio-hierachy. So we need to find a way to exend the fabio-ontology In order to cover our technical items, like the ATBD.
While finding an existing, public widespread ontology which defines the term ATBD for us would be the preferred option, in this blog we start to create our own definition/vocabulary and trying to extend fabio to our needs, by using SKOS.
SKOS, which stands for Simple Knowledge Organization System, is a W3C standard, based on other Semantic Web standards (RDF and OWL), that provides a way to represent controlled vocabularies, taxonomies and thesauri.
“The fundamental element of the SKOS vocabulary is the concept. Concepts are the units of thought —ideas, meanings, or (categories of) objects and events—which underlie many knowledge organization systems. As such, concepts exist in the mind as abstract entities which are independent of the terms used to label them.” SKOS Primer
So we start by expressing (using Turtle) our ATBD as a SKOs Concept and adding some labeling information and the definition to it:
cmsaf_vocab:ATBD rdf:type skos:Concept;
cmsaf_vocab:ATBD skos:prefLabel “ATBD”;
cmsaf_vocab:ATBD skos:altLabel “Algorithm Theoretical Basis Document”;
cmsaf_vocab:ATBD skos:definition “The Algorithm Theoretical Basis Documents (ATBD) are intended to describe the physical and mathematical description of the algorithms to be used in the generation of data products. The ATBD include a description of variance and uncertainty estimates and considerations of calibration and validation, exception control, and diagnostics. In some cases, internal and external data product flows are required.”;
And to build the bridge to the fabio-Ontology we just need to add a semantic relation information between our ATBD-concept and a fabio-Class:
cmsaf_vocab:ATBD skos:narrower fabio:TechnicalReport .
That means by using SKOS we can build up our own classification scheme in a common data model for knowledge organization systems, with the benefits, to easily incorporate or extend an existing KOS.
But by migrating from excel to CHARMe, we also switch from relational data to graph data and from SQL to SPARQL. That means, the question ‘Please give me all technical items related to project documention of a specific project.’ becomes in pseudo SPARQL :
?technical_item oa:hasTarget <uri of our project> .
?technical_item rdf:typeOf ?technical_item_type .
?technical_item_type skos:narrower+ fabio:TechnicalReport .
So there are some more steps adhead of us before we can migrate our EXCEL bibliographic database to CHARMe, but by using fabio and SKOS one of the big chunks can be tackled.