|
|
KIK-Starter relies on [RMLmapper](https://github.com/RMLio/rmlmapper-java) (version 5.0.0+) for data transformation.
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
## R2RML
|
|
|
|
|
|
RMLmapper uses a dialect of [R2RML](https://www.w3.org/TR/r2rml/) called [RML](https://rml.io/specs/rml/). RML is almost but not entirely compatible with R2RML, in particular for specification of data srouces. KIK-Starter can transform data sources from R2RML into RML, so any standards-compliant R2RML mapping should work.
|
|
|
|
|
|
## Data source specification
|
|
|
|
|
|
KIK-Starter has only been tested with relational data sources (CSV, Excel, ODS), so there may be issues with tree-based ones (JSON, YAML, XML).
|
|
|
|
|
|
KIK-Starter can make use of extra information on data sources for generation of a UI.
|
|
|
|
|
|
```turtle
|
|
|
@prefix rr: <http://www.w3.org/ns/r2rml#>.
|
|
|
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
|
|
|
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
|
|
|
|
|
|
<#BeaufortKwalificatieMapping> a rr:TriplesMap;
|
|
|
rdfs:comment "Kwalificatie duiding" ;
|
|
|
rml:logicalSource [
|
|
|
rdfs:label "Kwalificatie tabel" ;
|
|
|
rml:source "kwalificatie.csv" ;
|
|
|
rml:referenceFormulation ql:CSV
|
|
|
];
|
|
|
```
|
|
|
|
|
|
The data source file name `kwalificatie.csv` will be shown in the UI if nothing else is specified. If a `rdfs:comment` or `rdfs:label` is specified, they will be used instead (only one is needed, the `rdfs:label` is preferred over the `rdfs:comment if both are present).
|
|
|
|
|
|
KIK-Starter interprets data source names abstractly, so the file name has no semantic meaning (except if the same name is used more than once in a mapping the sources are considered the same).
|
|
|
|
|
|
The format is also interpreted abstractly with KIK-Starter automatically using the right reader, so even though CSV is specified in the example above, an end-user can provide a CSV file, an Excel file, or an ODS file as is more convenient to them.
|
|
|
|
|
|
KIK-Starter tries to guess whether to use semi colon (;) or comma (,) for separation in CSV files, and attempts to interpret numbers correctly (using European (1,0) or US (1.0) notation as applicable). This is not infallible, though, so it is good to favor using commas for separators and US number notation.
|
|
|
|
|
|
## Data transformation
|
|
|
|
|
|
Some data fields need adjustment, e.g., to transform different date formats or transform numbers to US notation if they are read as strings. KIK-Starter looks for queries inside the RML mapping and automatically executes them to accomplish this. Queries are using [SPARQL update](https://www.w3.org/TR/sparql11-update/).
|
|
|
|
|
|
Queries are validated to be safe prior to execution. Safe queries do not include any graph management operations (3.2, CREATE/DROP/COPY/MOVE/ADD), nor do they include LOAD/CLEAR updates (3.1.4, 3.1.5). Updates cannot use WITH/USING to access external graphs, nor can individual triples refer to external graphs. This all ensures that data transformations are self-contained and only alter the converted data.
|
|
|
|
|
|
Queries can be included anywhere in the RML; it makes sense to put them near to where they are used, but they have global scope so a single query replacing `vph-pers:eindDatum` will replace all `vph-pers:eindDatum`s, not just the ones near them. Here is a query inside a `rr:predicateObjectMap` that reformats all dates:
|
|
|
|
|
|
```turtle
|
|
|
@prefix rr: <http://www.w3.org/ns/r2rml#>.
|
|
|
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
|
|
|
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
|
|
|
@prefix vph-g: <http://purl.org/ozo/vph-g#>.
|
|
|
@prefix vph-pers: <http://purl.org/ozo/vph-pers#>.
|
|
|
@prefix q: <https://kik-v.nl/ontology/vkv/backend/profile#>.
|
|
|
|
|
|
rr:predicateObjectMap [
|
|
|
rr:predicate "http://purl.org/ozo/vph-pers#eindDatum";
|
|
|
rr:objectMap [
|
|
|
rml:reference "dienstverband.einddatum";
|
|
|
rr:datatype xsd:string;
|
|
|
q:query """
|
|
|
PREFIX vph-g: <http://purl.org/ozo/vph-g#>
|
|
|
PREFIX vph-pers: <http://purl.org/ozo/vph-pers#>
|
|
|
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
|
|
|
|
|
|
DELETE {
|
|
|
?s vph-pers:eindDatum ?e
|
|
|
}
|
|
|
INSERT {
|
|
|
?s vph-pers:eindDatum ?d
|
|
|
}
|
|
|
WHERE {
|
|
|
?s vph-pers:eindDatum ?e
|
|
|
BIND(STRDT(CONCAT(SUBSTR(?e, 1, 4), "-", SUBSTR(?e, 6, 2), "-", SUBSTR(?e, 9, 2)), xsd:date) AS ?d)
|
|
|
}
|
|
|
"""
|
|
|
]
|
|
|
];
|
|
|
```
|
|
|
|
|
|
It is worth noticing that the query does not inherit namespace prefix definitions.
|
|
|
|
|
|
Notice how the datatype is defined to be `xsd:string` in the RML mapping and subsequently replaced by the query to the appropriate type `xsd:date`. If the RML mapping put in the final type `xsd:date` it would lead to conversion errors if a date `2022/05/30` was provided as it does not match the proper format for the data type. Declared as a string it is allowed and the query can correctly transform it to `"2022-05-30"^xsd:date`.
|
|
|
|
|
|
## Custom mappings
|
|
|
|
|
|
Some mappings require a bit of user input. KIK-Starter provides extensions that allow mapping creators to specify this fact in a manner that allows KIK-Starter to automatically generate mapping templates for end-users.
|
|
|
|
|
|
KIK-Starter presently supports these extensions
|
|
|
|
|
|
| extension | purpose |
|
|
|
| ------ | ------ |
|
|
|
| c:subclassOf | map to subclasses of provided class |
|
|
|
| c:instanceOf | map to instances of provided class |
|
|
|
|
|
|
These extensions can be provided using the short or complete syntax. Usage of the short syntax is strongly encouraged for simplicity. KIK-Starter automatically transforms the short syntax to the long syntax. The long syntax is completely standard RML with a few extra annotations to help KIK-Starter generate the correct templates.
|
|
|
|
|
|
Custom mappings are automatically unified when using the short syntax. That is, if the same mapping is needed more than once, the user is only asked to fill it in once.
|
|
|
|
|
|
Custom mapping are special joins omitting the parent map. They are transformed to standard maps. Using an `rdfs:label` or `rdfs:comment` inside the `rr:joinCondition` is encouraged to allow the end-user to identify the appropriate mapping as they otherwise get a meaningless generated name.
|
|
|
|
|
|
### subclassOf
|
|
|
|
|
|
Short form:
|
|
|
```turtle
|
|
|
@prefix rr: <http://www.w3.org/ns/r2rml#>.
|
|
|
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
|
|
|
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
|
|
|
@prefix vph-g: <http://purl.org/ozo/vph-g#>.
|
|
|
@prefix vph-pers: <http://purl.org/ozo/vph-pers#>.
|
|
|
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
|
|
|
@prefix c: <https://kik-v.nl/ontology/vkv/backend/conversie#>.
|
|
|
|
|
|
rr:predicateObjectMap [
|
|
|
rr:predicate "http://www.w3.org/1999/02/22-rdf-syntax-ns#type";
|
|
|
rr:objectMap [
|
|
|
rr:joinCondition [
|
|
|
rdfs:label "Arbeidsrelatie Mapping" ;
|
|
|
rdfs:comment "Arbeidsrelatie Mapping Comment" ;
|
|
|
rr:child "arbeidsrelatie.name";
|
|
|
c:subclassOf vph-pers:WerkOvereenkomst ;
|
|
|
];
|
|
|
]
|
|
|
];
|
|
|
```
|
|
|
|
|
|
KIK-Starter transforms this into the corresponding long form
|
|
|
```turtle
|
|
|
rr:predicateObjectMap [
|
|
|
rr:predicate "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
|
|
rr:objectMap [
|
|
|
rr:joinCondition [
|
|
|
rdfs:label "Arbeidsrelatie Mapping" ;
|
|
|
rdfs:comment "Arbeidsrelatie Mapping Comment" ;
|
|
|
rr:child "arbeidsrelatie.name" ;
|
|
|
rr:parent "arbeidsrelatie.name"
|
|
|
] ;
|
|
|
rr:parentTriplesMap [
|
|
|
a c:SubclassMapping , rr:TriplesMap , c:CustomTriplesMap ;
|
|
|
rdfs:comment "Arbeidsrelatie Mapping" ;
|
|
|
rml:logicalSource [
|
|
|
rdfs:label "Arbeidsrelatie Mapping" ;
|
|
|
rml:referenceFormulation ql:CSV ;
|
|
|
rml:source "custom_mapping_1.csv" ;
|
|
|
c:column "arbeidsrelatie.name" ;
|
|
|
c:source "dienstverband_export.csv" ;
|
|
|
c:subclassOf vph-pers:WerkOvereenkomst
|
|
|
] ;
|
|
|
rr:subjectMap [ rml:reference "value" ]
|
|
|
]
|
|
|
] ;
|
|
|
] ;
|
|
|
```
|
|
|
|
|
|
Here, the `c:subclassOf ` has been replaced by a generated `rr:parent` mapping, and a `rr:parentTriplesMap` has been added. The `rr:parentTriplesMap` is basically a normal CSV file, but implements `c:SubclassMapping `. The `rml:logicalSource` also includes information to describe how to generate a template. The `rdfs:label` (preferred over the`rdfs:comment`) from the short form has been copied over to get displayed in the KIK-Starter UI. It is possible to use the long form using RMLmapper without the assistance of KIK-Starter.
|
|
|
|
|
|
### instanceOf
|
|
|
|
|
|
Short form:
|
|
|
```turtle
|
|
|
@prefix rr: <http://www.w3.org/ns/r2rml#>.
|
|
|
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
|
|
|
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
|
|
|
@prefix vph-g: <http://purl.org/ozo/vph-g#>.
|
|
|
@prefix vph-pers: <http://purl.org/ozo/vph-pers#>.
|
|
|
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
|
|
|
@prefix c: <https://kik-v.nl/ontology/vkv/backend/conversie#>.
|
|
|
|
|
|
rr:predicateObjectMap [
|
|
|
rr:predicate vph-g:hasQualityValue;
|
|
|
rr:objectMap [
|
|
|
rr:joinCondition [
|
|
|
rdfs:label "Kwalificatie" ;
|
|
|
rr:child "kwalificatie.naam";
|
|
|
c:instanceOf vph-pers:ODBKwalificatie ;
|
|
|
];
|
|
|
]
|
|
|
];
|
|
|
```
|
|
|
|
|
|
KIK-Starter transforms this into the corresponding long form
|
|
|
```turtle
|
|
|
rr:predicateObjectMap [
|
|
|
rr:predicate vph-g:hasQualityValue
|
|
|
rr:objectMap [
|
|
|
rr:joinCondition [
|
|
|
rdfs:label "Kwalificatie" ;
|
|
|
rr:child "kwalificatie.naam" ;
|
|
|
rr:parent "kwalificatie.naam"
|
|
|
] ;
|
|
|
rr:parentTriplesMap [
|
|
|
a rr:TriplesMap , c:InstanceMapping , c:CustomTriplesMap ;
|
|
|
rdfs:comment "Kwalificatie" ;
|
|
|
rml:logicalSource [
|
|
|
rdfs:label "Kwalificatie" ;
|
|
|
rml:referenceFormulation ql:CSV ;
|
|
|
rml:source "custom_mapping_2.csv" ;
|
|
|
c:column "kwalificatie.naam" ;
|
|
|
c:instanceOf vph-pers:ODBKwalificatie ;
|
|
|
c:source "kwalificatie.csv"
|
|
|
] ;
|
|
|
rr:subjectMap [ rml:reference "value" ]
|
|
|
]
|
|
|
] ;
|
|
|
] ;
|
|
|
```
|
|
|
|
|
|
Here, the `c:instanceOf` has been replaced by a generated `rr:parent` mapping, and a `rr:parentTriplesMap` has been added. The `rr:parentTriplesMap` is basically a normal CSV file, but implements `c:InstanceMapping`. The `rml:logicalSource` also includes information to describe how to generate a template. The `rdfs:label` from the short form has been copied over to get displayed in the KIK-Starter UI. It is possible to use the long form using RMLmapper without the assistance of KIK-Starter. |
|
|
\ No newline at end of file |