Blog | Trevas

Trevas - VTL 2.1

October 9, 2024 · One min read

Nicolas Laval

Making Sense - Developer

Trevas 1.7.0 upgrade to version 2.1 of VTL.

This version introduces two new operators:

random
case

random produces a decimal number between 0 and 1.

case allows for clearer multi conditional branching, for example:

ds2 := ds1[ calc c := case when r < 0.2 then "Low" when r > 0.8 then "High" else "Medium" ]

Both operators are already available in Trevas!

The new grammar also provides time operators and includes corrections, without any breaking changes compared to the 2.0 version.

See the coverage section for more details.

Trevas - Provenance

October 7, 2024 · 4 min read

Nicolas Laval

Making Sense - Developer

News

Trevas 1.6.0 introduces the VTL Prov module.

This module enables to produce lineage metadata from Trevas, based on RDF ontologies: PROV-O and SDTH.

SDTH model overview

Adopted model

The vtl-prov module, version 1.6.0, uses the following partial model:

Improvements will come in next weeks.

Tools available

Provenance Trevas tools are documented here.

Example

Business use case

Two sources datasets are transformed to produce transient datasets and a final permanent one.

Inputs

ds1 & ds2 metadata:

id	var1	var2
STRING	INTEGER	NUMBER
IDENTIFIER	MEASURE	MEASURE

VTL script

ds_sum := ds1 + ds2;
ds_mul := ds_sum * 3;
ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];

RDF model target

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX sdth: <http://rdf-vocabulary.ddialliance.org/sdth#>

# --- Program and steps
<http://example.com/program1> a sdth:Program ;
    a prov:Agent ; # Agent? Or an activity
    rdfs:label "My program 1"@en, "Mon programme 1"@fr ;
    sdth:hasProgramStep <http://example.com/program1/program-step1>,
                        <http://example.com/program1/program-step2>,
                        <http://example.com/program1/program-step3> .

<http://example.com/program1/program-step1> a sdth:ProgramStep ;
    rdfs:label "Program step 1"@en, "Étape 1"@fr ;
    sdth:hasSourceCode "ds_sum := ds1 + ds2;" ;
    sdth:consumesDataframe  <http://example.com/dataset/ds1>,
                            <http://example.com/dataset/ds2> ;
    sdth:producesDataframe <http://example.com/dataset/ds_sum> .

<http://example.com/program1/program-step2> a sdth:ProgramStep ;
    rdfs:label "Program step 2"@en, "Étape 2"@fr ;
    sdth:hasSourceCode "ds_mul := ds_sum * 3;" ;
    sdth:consumesDataframe <http://example.com/dataset/ds_sum> ;
    sdth:producesDataframe <http://example.com/dataset/ds_mul> .

<http://example.com/program1/program-step3> a sdth:ProgramStep ;
    rdfs:label "Program step 3"@en, "Étape 3"@fr ;
    sdth:hasSourceCode "ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];" ;
    sdth:consumesDataframe <http://example.com/dataset/ds_mul> ;
    sdth:producesDataframe <http://example.com/dataset/ds_res> ;
    sdth:usesVariable   <http://example.com/variable/var1>,
                        <http://example.com/variable/var2> ;
    sdth:assignsVariable <http://example.com/variable/var_sum> .

# --- Variables
# i think here it's not instances but names we refer to...
<http://example.com/variable/id1> a sdth:VariableInstance ;
                                  rdfs:label "id1" .
<http://example.com/variable/var1> a sdth:VariableInstance ;
                                  rdfs:label "var1" .
<http://example.com/variable/var2> a sdth:VariableInstance ;
                                  rdfs:label "var2" .
<http://example.com/variable/var_sum> a sdth:VariableInstance ;
                                  rdfs:label "var_sum" .

# --- Data frames
<http://example.com/dataset/ds1> a sdth:DataframeInstance ;
    rdfs:label "ds1" ;
    sdth:hasName "ds1" ;
    sdth:hasVariableInstance    <http://example.com/variable/id1>,
                                <http://example.com/variable/var1>,
                                <http://example.com/variable/var2> .

<http://example.com/dataset/ds2> a sdth:DataframeInstance ;
    rdfs:label "ds2" ;
    sdth:hasName "ds2" ;
    sdth:hasVariableInstance    <http://example.com/variable/id1>,
                                <http://example.com/variable/var1>,
                                <http://example.com/variable/var2> .

<http://example.com/dataset/ds_sum> a sdth:DataframeInstance ;
    rdfs:label "ds_sum" ;
    sdth:hasName "ds_sum" ;
    sdth:wasDerivedFrom <http://example.com/dataset/ds1>,
                        <http://example.com/dataset/ds2> ;
    sdth:hasVariableInstance    <http://example.com/variable/id1>,
                                <http://example.com/variable/var1>,
                                <http://example.com/variable/var2> .

<http://example.com/dataset/ds_mul> a sdth:DataframeInstance ;
    rdfs:label "ds_mul" ;
    sdth:hasName "ds_mul" ;
    sdth:wasDerivedFrom <http://example.com/dataset/ds_sum> ;
    sdth:hasVariableInstance    <http://example.com/variable/id1>,
                                <http://example.com/variable/var1>,
                                <http://example.com/variable/var2> .

<http://example.com/dataset/ds_res> a sdth:DataframeInstance ;
    rdfs:label "ds_res" ;
    sdth:wasDerivedFrom <http://example.com/dataset/ds_mul> ;
    sdth:hasVariableInstance    <http://example.com/variable/id1>,
                                <http://example.com/variable/var1>,
                                <http://example.com/variable/var2>,
                                <http://example.com/variable/var_sum> .

Trevas - SDMX

June 25, 2024 · One min read

Nicolas Laval

Making Sense - Developer

News

Trevas 1.4.1 introduces the VTL SDMX module.

This module enables to consume SDMX metadata sources to instantiate Trevas DataStructures and Datasets.

It also allows to execute the VTL TransformationSchemes to obtain the resulting persistent datasets.

Overview

Trevas supports the above SDMX message elements. Only the VtlMappingSchemes element is optional.

The elements in box 1 are used to produce Trevas DataStructures, filling VTL components attributes name, role, type, nullable and valuedomain.

The elements in box 2 are used to generate the VTL code (rulesets & transformations).

Tools available

SDMX Trevas tools are documented here.

Troubleshooting

Have a look to this section.

Trevas - Temporal operators

June 21, 2024 · 3 min read

Hadrien Kohl

Hadrien Kohl Consulting - Developer

Temporal operators in Trevas

The version 1.4.1 of Trevas introduces preliminary support for date and time types and operators.

The specification describes temporal types such as date, time_period, time, and duration. However, Trevas authors find these descriptions unsatisfactory. This blog post outlines our implementation choices and how they differ from the spec.

In the specification, time_period (and the types date) is described as a compound type with a start and end (or a start and a duration). This complicates the implementation and brings little value to the language as one can simply operate on a combination of dates or date and duration directly. For this reason, we defined an algebra between the temporal types and did not yet implement the time_period.

result (operators)	date	duration	number
date	n/a	date (+, -)	n/a
duration	date (+, -)	duration (+, -)	duration (*)
number	n/a	duration (*)	n/a

The period_indicator function relies on period-awareness for types that are not defined enough at the moment to be implemented.

Java mapping

The VTL type date is represented internally as the types java.time.Instant, java.time.ZonedDateTime and java.time.OffsetDateTime

Instant represent a specific moment in time. Note that this type does not include timezone information and is therefore not usable with all the operators. One can use the types ZonedDateTime and OffsetDateTime when timezone or time saving is required.

The VTL type duration is represented internally as the type org.threeten.extra.PeriodDuration from the threeten extra package. It represents a duration using both calendar units (years, months, days) and a temporal amount (hours, minutes, seconds and nanoseconds).

Function `flow_to_stock`

The flow_to_stock function converts a data set with flow interpretation into a stock interpretation. This transformation is useful when you want to aggregate flow data (e.g., sales or production rates) into cumulative stock data (e.g., total inventory).

Syntax:

result := flow_to_stock(op)

Parameters:

op - The input data set with flow interpretation. The data set must have an identifier of type time, additional identifiers, and at least one measure of type number.

Result:

The function returns a data set with the same structure as the input, but with the values converted to stock interpretation.

Function `stock_to_flow`

The stock_to_flow function converts a data set with stock interpretation into a flow interpretation. This transformation is useful when you want to derive flow data from cumulative stock data.

Syntax:

result := stock_to_flow(op)

Parameters:

op - The input data set with stock interpretation. The data set must have an identifier of type time, additional identifiers, and at least one measure of type number.

Result:

The function returns a data set with the same structure as the input, but with the values converted to flow interpretation.

Function `timeshift`

The timeshift function shifts the time component of a specified range of time in the data set. This is useful for analyzing data at different time offsets, such as comparing current values to past values.

Syntax:

result := timeshift(op, shiftNumber)

Parameters:

op - The operand data set containing time series.
shiftNumber - An integer representing the number of periods to shift. Positive values shift forward in time, while negative values shift backward.

Result:

The function returns a data set with the time identifiers shifted by the specified number of periods.

Trevas - Java 17

November 22, 2023 · One min read

Nicolas Laval

Making Sense - Developer

News

Trevas 1.2.0 enables Java 17 support.

Java modules handling

Spark does not support Java modules.

Java 17 client apps, embedding Trevas in Spark mode have to configure UNNAMED modules for Spark.

Maven

Add to your pom.xml file, in the build > plugins section:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.11.0</version>
    <configuration>
        <compilerArgs>
            <arg>--add-exports</arg>
            <arg>java.base/sun.nio.ch=ALL-UNNAMED</arg>
        </compilerArgs>
    </configuration>
</plugin>

Docker

ENTRYPOINT ["java", "--add-exports", "java.base/sun.nio.ch=ALL-UNNAMED", "mainClass"]

Trevas - Persistent assignments

November 22, 2023 · One min read

Nicolas Laval

Making Sense - Developer

News

Trevas 1.2.0 includes the persistent assignment support: ds1 <- ds;.

In Trevas, persistent datasets are represented as PersistentDataset.

Handle `PersistentDataset`

Trevas datasets are represented as Dataset.

After running the Trevas engine, you can use persistent datasets with something like:

Bindings engineBindings = engine.getContext().getBindings(ScriptContext.ENGINE_SCOPE);
engineBindings.forEach((k, v) -> {
    if (v instanceof PersistentDataset) {
        fr.insee.vtl.model.Dataset ds = ((PersistentDataset) v).getDelegate();
        if (ds instanceof SparkDataset) {
            Dataset<Row> sparkDs = ((SparkDataset) ds).getSparkDataset();
            // Do what you want with sparkDs
        }
    }
});

Trevas - check_hierarchy

September 1, 2023 · One min read

Nicolas Laval

Making Sense - Developer

News

Trevas 1.1.0 includes hierarchical validation via operators define hierarchical ruleset and check_hierarchy.

Example

Input

ds1:

id	Me
ABC	12
A	1
B	10
C	1
DEF	100
E	99
F	1
HIJ	100
H	99
I	0

VTL script

// Ensure ds1 metadata definition is good
ds1 := ds1[calc identifier id := id, Me := cast(Me, integer)];

// Define hierarchical ruleset
define hierarchical ruleset hr (variable rule Me) is
    My_Rule : ABC = A + B + C errorcode "ABC is not sum of A,B,C" errorlevel 1;
    DEF = D + E + F errorcode "DEF is not sum of D,E,F";
    HIJ : HIJ = H + I - J errorcode "HIJ is not H + I - J" errorlevel 10
end hierarchical ruleset;

// Check hierarchy
ds_all := check_hierarchy(ds1, hr rule id all);
ds_all_measures := check_hierarchy(ds1, hr rule id always_null all_measures);
ds_invalid := check_hierarchy(ds1, hr rule id always_zero invalid);

Outputs

ds_all

id	ruleid	bool_var	errorcode	errorlevel	imbalance
ABC	My_Rule	true	null	null	0

ds_always_null_all_measures

id	Me	ruleid	bool_var	errorcode	errorlevel	imbalance
ABC	12	My_Rule	true	null	null	0
DEF	100	hr_2	null	null	null	null
HIJ	100	HIJ	null	null	null	null

ds_invalid

id	Me	ruleid	errorcode	errorlevel	imbalance
HIJ	100	HIJ	HIJ is not H + I - J	10	1

Trevas Batch 0.1.1

July 2, 2023 · One min read

Nicolas Laval

Making Sense - Developer

Trevas Batch 0.1.1 uses version 1.0.2 of Trevas.

This Java batch provides Trevas execution metrics in Spark mode.

The configuration file to fill in is described in the README of the project. Launching the batch will produce a Markdown file as output.

Launch

Local

java -jar trevas-batch-0.1.1.jar -Dconfig.path="..." -Dreport.path="..."

The java execution will be done in local Spark.

Kubernetes

Default Kubernetes objects are defined in the .kubernetes folder.

Feed the config-map.yml file then launch the job in your cluster.

Trevas Jupyter 0.3.2

July 1, 2023 · One min read

Nicolas Laval

Making Sense - Developer

Trevas Jupyter 0.3.2 uses version 1.0.2 of Trevas.

News

In addition to the VTL coverage greatly increased since the publication of Trevas 1.x.x, Trevas Jupyter offers 1 new connector:

SAS files (via the loadSas method)

Launch

Manually adding the Trevas Kernel to an existing Jupyter instance

Trevas Jupyter compiler
Copy the kernel.json file and the bin and repo folders to a new kernel folder.
Edit the kernel.json file
Launch Jupyter

Docker

docker pull inseefrlab/trevas-jupyter:0.3.2
docker run -p 8888:8888 inseefrlab/trevas-jupyter:0.3.2

Helm

The Trevas Jupyter docker image can be instantiated via the jupyter-pyspark Helm contract from InseeFrLab.

Trevas Lab 0.3.3

July 1, 2023 · One min read

Nicolas Laval

Making Sense - Developer

Trevas Lab 0.3.3 uses version 1.0.2 of Trevas.

News

In addition to the VTL coverage greatly increased since the publication of Trevas 1.x.x, Trevas Lab offers 2 new connectors:

SAS files
JDBC MariaDB

Launch

Kubernetes

Sample Kubernetes objects are available in the .kubernetes folders of Trevas Lab and Trevas Lab UI.

News​

SDTH model overview​

Adopted model​

Tools available​

Example​

Business use case​

Inputs​

VTL script​

RDF model target​

News​

Overview​

Tools available​

Troubleshooting​

Temporal operators in Trevas​

Java mapping​

Function flow_to_stock​

Function stock_to_flow​

Function timeshift​

News​

Java modules handling​

Maven​

Docker​

News​

Handle PersistentDataset​

News​

Example​

Input​

VTL script​

Outputs​

Launch​

Local​

Kubernetes​

News​

Launch​

Manually adding the Trevas Kernel to an existing Jupyter instance​

Docker​

Helm​

News​

Launch​

Kubernetes​

News

SDTH model overview

Adopted model

Tools available

Example

Business use case

Inputs

VTL script

RDF model target

News

Overview

Tools available

Troubleshooting

Temporal operators in Trevas

Java mapping

Function `flow_to_stock`

Function `stock_to_flow`

Function `timeshift`

News

Java modules handling

Maven

Docker

News

Handle `PersistentDataset`

News

Example

Input

VTL script

Outputs

Launch

Local

Kubernetes

News

Launch

Manually adding the Trevas Kernel to an existing Jupyter instance

Docker

Helm

News

Launch

Kubernetes