Trevas - Provenance
· 4 minutes de lecture
Nouveautés
Trevas 1.6.0 introduit le module VTL Prov.
Ce module permet de produire des métadonnées de lineage à partir de Trevas, basées sur les ontologies RDF : PROV-O
et SDTH
.
Aperçu du modèle SDTH
Modèle adopté
Le module vtl-prov
, version 1.6.0, utilise le modèle partiel suivant :
Des améliorations arriveront dans les prochaines semaines.
Outils disponibles
Les outils de provenance Trevas sont documentés ici.
Exemple
Cas d'utilisation métier
Deux jeux de données sources sont transformés pour produire des jeux de données transitoires et un jeu de données final permanent.
Entrées
ds1
& ds2
métadonnées:
id | var1 | var2 |
---|---|---|
STRING | INTEGER | NUMBER |
IDENTIFIER | MEASURE | MEASURE |
Script VTL
ds_sum := ds1 + ds2;
ds_mul := ds_sum * 3;
ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];
Modèle RDF cible
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX sdth: <http://rdf-vocabulary.ddialliance.org/sdth#>
# --- Program and steps
<http://example.com/program1> a sdth:Program ;
a prov:Agent ; # Agent? Or an activity
rdfs:label "My program 1"@en, "Mon programme 1"@fr ;
sdth:hasProgramStep <http://example.com/program1/program-step1>,
<http://example.com/program1/program-step2>,
<http://example.com/program1/program-step3> .
<http://example.com/program1/program-step1> a sdth:ProgramStep ;
rdfs:label "Program step 1"@en, "Étape 1"@fr ;
sdth:hasSourceCode "ds_sum := ds1 + ds2;" ;
sdth:consumesDataframe <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:producesDataframe <http://example.com/dataset/ds_sum> .
<http://example.com/program1/program-step2> a sdth:ProgramStep ;
rdfs:label "Program step 2"@en, "Étape 2"@fr ;
sdth:hasSourceCode "ds_mul := ds_sum * 3;" ;
sdth:consumesDataframe <http://example.com/dataset/ds_sum> ;
sdth:producesDataframe <http://example.com/dataset/ds_mul> .
<http://example.com/program1/program-step3> a sdth:ProgramStep ;
rdfs:label "Program step 3"@en, "Étape 3"@fr ;
sdth:hasSourceCode "ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];" ;
sdth:consumesDataframe <http://example.com/dataset/ds_mul> ;
sdth:producesDataframe <http://example.com/dataset/ds_res> ;
sdth:usesVariable <http://example.com/variable/var1>,
<http://example.com/variable/var2> ;
sdth:assignsVariable <http://example.com/variable/var_sum> .
# --- Variables
# i think here it's not instances but names we refer to...
<http://example.com/variable/id1> a sdth:VariableInstance ;
rdfs:label "id1" .
<http://example.com/variable/var1> a sdth:VariableInstance ;
rdfs:label "var1" .
<http://example.com/variable/var2> a sdth:VariableInstance ;
rdfs:label "var2" .
<http://example.com/variable/var_sum> a sdth:VariableInstance ;
rdfs:label "var_sum" .
# --- Data frames
<http://example.com/dataset/ds1> a sdth:DataframeInstance ;
rdfs:label "ds1" ;
sdth:hasName "ds1" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds2> a sdth:DataframeInstance ;
rdfs:label "ds2" ;
sdth:hasName "ds2" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds_sum> a sdth:DataframeInstance ;
rdfs:label "ds_sum" ;
sdth:hasName "ds_sum" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds_mul> a sdth:DataframeInstance ;
rdfs:label "ds_mul" ;
sdth:hasName "ds_mul" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_sum> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .
<http://example.com/dataset/ds_res> a sdth:DataframeInstance ;
rdfs:label "ds_res" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_mul> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2>,
<http://example.com/variable/var_sum> .