Skip to main content

9 posts tagged with "Trevas"

View All Tags

Spark 4 in Trevas

· 3 min read
Nicolas Laval
Making Sense - Developer

We are happy to announce Trevas 2.4.0, which adds Apache Spark 4 support through the new vtl-spark4 module.

If you want to move your Spark-based client applications to Spark 4, you can depend on fr.insee.trevas:vtl-spark4 alongside the rest of the Trevas stack. The VTL API and behaviour stay the same; only the Spark integration layer changes.

Spark 3 is not going away. The existing vtl-spark module remains fully maintained in parallel. You can stay on Spark 3 for as long as you need—there is no forced migration timeline.

See the 2.4.0 release notes and the GitHub release for the full changelog.

Trevas client apps — shaded ANTLR imports

Keeping Spark 3 and Spark 4 on the same Trevas codebase also pushed us to refine how ANTLR is packaged: the runtime is now shaded and relocated so Trevas and Spark no longer fight over the same org.antlr.v4 classes on the classpath.

Starting with Trevas 2.4.0, this note applies only to client applications that explicitly use ANTLR APIs in their own code (lexer, token stream, parse tree, listeners, and so on) in addition to Trevas. If your app only calls Trevas APIs and never imports or manipulates ANTLR types directly, nothing changes for you.

If you do touch ANTLR yourself—whether you stay on Apache Spark 3 (vtl-spark) or move to Spark 4 (vtl-spark4)—you must import the runtime from the relocated package namespace:

import fr.insee.vtl.antlr.runtime.*;
import fr.insee.vtl.antlr.runtime.tree.*;
// … and other fr.insee.vtl.antlr.* subpackages as needed

Previously, code that touched the parser or ANTLR APIs directly often used the stock ANTLR packages, for example:

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;

Those imports no longer match the classes Trevas ships at runtime. Trevas shades org.antlr:antlr4-runtime into the vtl-antlr artifact and relocates org.antlr.v4fr.insee.vtl.antlr so Trevas and Spark can share a JVM without loading two competing ANTLR runtimes.

What you need to change

  • Update every org.antlr.v4… import in your application (and in any code generated against Trevas parser types) to the matching fr.insee.vtl.antlr… package.
  • Rely on fr.insee.trevas:vtl-antlr (transitively via vtl-parser / vtl-engine) for the runtime; do not add a separate dependency on org.antlr:antlr4-runtime for Trevas-related parsing.
  • This applies equally to Spark 3 and Spark 4 integrations: both use the same shaded parser stack.

A typical mapping:

BeforeAfter
org.antlr.v4.runtime.CharStreamsfr.insee.vtl.antlr.runtime.CharStreams
org.antlr.v4.runtime.CommonTokenStreamfr.insee.vtl.antlr.runtime.CommonTokenStream
org.antlr.v4.runtime.tree.ParseTreefr.insee.vtl.antlr.runtime.tree.ParseTree

For more technical details, see here the documentation.

Trevas - Version 2.0.0

· One min read
Nicolas Laval
Making Sense - Developer

Trevas 2.0.0 is released!

Following the implementation of DAGs and the reordering of VTL instructions before execution, evaluating a VTL script will integrate this new functionality by default.

A technical documentation is available to describe this feature and how to disable it.

Trevas - VTL 2.1

· One min read
Nicolas Laval
Making Sense - Developer

Trevas 1.7.0 upgrade to version 2.1 of VTL.

This version introduces two new operators:

  • random
  • case

random produces a decimal number between 0 and 1.

case allows for clearer multi conditional branching, for example:

ds2 := ds1[ calc c := case when r < 0.2 then "Low" when r > 0.8 then "High" else "Medium" ]

Both operators are already available in Trevas!

The new grammar also provides time operators and includes corrections, without any breaking changes compared to the 2.0 version.

See the coverage section for more details.

Trevas - Provenance

· 4 min read
Nicolas Laval
Making Sense - Developer

News

Trevas 1.6.0 introduces the VTL Prov module.

This module enables to produce lineage metadata from Trevas, based on RDF ontologies: PROV-O and SDTH.

SDTH model overview

Adopted model

The vtl-prov module, version 1.6.0, uses the following partial model:

Improvements will come in next weeks.

Tools available

Provenance Trevas tools are documented here.

Example

Business use case

Two sources datasets are transformed to produce transient datasets and a final permanent one.

Inputs

ds1 & ds2 metadata:

idvar1var2
STRINGINTEGERNUMBER
IDENTIFIERMEASUREMEASURE

VTL script

ds_sum := ds1 + ds2;
ds_mul := ds_sum * 3;
ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];

RDF model target

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX sdth: <http://rdf-vocabulary.ddialliance.org/sdth#>

# --- Program and steps
<http://example.com/program1> a sdth:Program ;
a prov:Agent ; # Agent? Or an activity
rdfs:label "My program 1"@en, "Mon programme 1"@fr ;
sdth:hasProgramStep <http://example.com/program1/program-step1>,
<http://example.com/program1/program-step2>,
<http://example.com/program1/program-step3> .

<http://example.com/program1/program-step1> a sdth:ProgramStep ;
rdfs:label "Program step 1"@en, "Étape 1"@fr ;
sdth:hasSourceCode "ds_sum := ds1 + ds2;" ;
sdth:consumesDataframe <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:producesDataframe <http://example.com/dataset/ds_sum> .

<http://example.com/program1/program-step2> a sdth:ProgramStep ;
rdfs:label "Program step 2"@en, "Étape 2"@fr ;
sdth:hasSourceCode "ds_mul := ds_sum * 3;" ;
sdth:consumesDataframe <http://example.com/dataset/ds_sum> ;
sdth:producesDataframe <http://example.com/dataset/ds_mul> .

<http://example.com/program1/program-step3> a sdth:ProgramStep ;
rdfs:label "Program step 3"@en, "Étape 3"@fr ;
sdth:hasSourceCode "ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];" ;
sdth:consumesDataframe <http://example.com/dataset/ds_mul> ;
sdth:producesDataframe <http://example.com/dataset/ds_res> ;
sdth:usesVariable <http://example.com/variable/var1>,
<http://example.com/variable/var2> ;
sdth:assignsVariable <http://example.com/variable/var_sum> .

# --- Variables
# i think here it's not instances but names we refer to...
<http://example.com/variable/id1> a sdth:VariableInstance ;
rdfs:label "id1" .
<http://example.com/variable/var1> a sdth:VariableInstance ;
rdfs:label "var1" .
<http://example.com/variable/var2> a sdth:VariableInstance ;
rdfs:label "var2" .
<http://example.com/variable/var_sum> a sdth:VariableInstance ;
rdfs:label "var_sum" .

# --- Data frames
<http://example.com/dataset/ds1> a sdth:DataframeInstance ;
rdfs:label "ds1" ;
sdth:hasName "ds1" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds2> a sdth:DataframeInstance ;
rdfs:label "ds2" ;
sdth:hasName "ds2" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds_sum> a sdth:DataframeInstance ;
rdfs:label "ds_sum" ;
sdth:hasName "ds_sum" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds_mul> a sdth:DataframeInstance ;
rdfs:label "ds_mul" ;
sdth:hasName "ds_mul" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_sum> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds_res> a sdth:DataframeInstance ;
rdfs:label "ds_res" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_mul> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2>,
<http://example.com/variable/var_sum> .

Trevas - SDMX

· One min read
Nicolas Laval
Making Sense - Developer

News

Trevas 1.4.1 introduces the VTL SDMX module.

This module enables to consume SDMX metadata sources to instantiate Trevas DataStructures and Datasets.

It also allows to execute the VTL TransformationSchemes to obtain the resulting persistent datasets.

Overview

VTL SDMX DiagramVTL SDMX Diagram

Trevas supports the above SDMX message elements. Only the VtlMappingSchemes element is optional.

The elements in box 1 are used to produce Trevas DataStructures, filling VTL components attributes name, role, type, nullable and valuedomain.

The elements in box 2 are used to generate the VTL code (rulesets & transformations).

Tools available

SDMX Trevas tools are documented here.

Troubleshooting

Have a look to this section.

Trevas - Temporal operators

· 3 min read
Hadrien Kohl
Hadrien Kohl Consulting - Developer

Temporal operators in Trevas

The version 1.4.1 of Trevas introduces preliminary support for date and time types and operators.

The specification describes temporal types such as date, time_period, time, and duration. However, Trevas authors find these descriptions unsatisfactory. This blog post outlines our implementation choices and how they differ from the spec.

In the specification, time_period (and the types date) is described as a compound type with a start and end (or a start and a duration). This complicates the implementation and brings little value to the language as one can simply operate on a combination of dates or date and duration directly. For this reason, we defined an algebra between the temporal types and did not yet implement the time_period.

result (operators)datedurationnumber
daten/adate (+, -)n/a
durationdate (+, -)duration (+, -)duration (*)
numbern/aduration (*)n/a

The period_indicator function relies on period-awareness for types that are not defined enough at the moment to be implemented.

Java mapping

The VTL type date is represented internally as the types java.time.Instant, java.time.ZonedDateTime and java.time.OffsetDateTime

Instant represent a specific moment in time. Note that this type does not include timezone information and is therefore not usable with all the operators. One can use the types ZonedDateTime and OffsetDateTime when timezone or time saving is required.

The VTL type duration is represented internally as the type org.threeten.extra.PeriodDuration from the threeten extra package. It represents a duration using both calendar units (years, months, days) and a temporal amount (hours, minutes, seconds and nanoseconds).

Function flow_to_stock

The flow_to_stock function converts a data set with flow interpretation into a stock interpretation. This transformation is useful when you want to aggregate flow data (e.g., sales or production rates) into cumulative stock data (e.g., total inventory).

Syntax:

result := flow_to_stock(op)

Parameters:

  • op - The input data set with flow interpretation. The data set must have an identifier of type time, additional identifiers, and at least one measure of type number.

Result:

The function returns a data set with the same structure as the input, but with the values converted to stock interpretation.

Function stock_to_flow

The stock_to_flow function converts a data set with stock interpretation into a flow interpretation. This transformation is useful when you want to derive flow data from cumulative stock data.

Syntax:

result := stock_to_flow(op)

Parameters:

  • op - The input data set with stock interpretation. The data set must have an identifier of type time, additional identifiers, and at least one measure of type number.

Result:

The function returns a data set with the same structure as the input, but with the values converted to flow interpretation.

Function timeshift

The timeshift function shifts the time component of a specified range of time in the data set. This is useful for analyzing data at different time offsets, such as comparing current values to past values.

Syntax:

result := timeshift(op, shiftNumber)

Parameters:

  • op - The operand data set containing time series.
  • shiftNumber - An integer representing the number of periods to shift. Positive values shift forward in time, while negative values shift backward.

Result:

The function returns a data set with the time identifiers shifted by the specified number of periods.

Trevas - Java 17

· One min read
Nicolas Laval
Making Sense - Developer

News

Trevas 1.2.0 enables Java 17 support.

Java modules handling

Spark does not support Java modules.

Java 17 client apps, embedding Trevas in Spark mode have to configure UNNAMED modules for Spark.

Maven

Add to your pom.xml file, in the build > plugins section:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.11.0</version>
<configuration>
<compilerArgs>
<arg>--add-exports</arg>
<arg>java.base/sun.nio.ch=ALL-UNNAMED</arg>
</compilerArgs>
</configuration>
</plugin>

Docker

ENTRYPOINT ["java", "--add-exports", "java.base/sun.nio.ch=ALL-UNNAMED", "mainClass"]

Trevas - Persistent assignments

· One min read
Nicolas Laval
Making Sense - Developer

News

Trevas 1.2.0 includes the persistent assignment support: ds1 <- ds;.

In Trevas, persistent datasets are represented as PersistentDataset.

Handle PersistentDataset

Trevas datasets are represented as Dataset.

After running the Trevas engine, you can use persistent datasets with something like:

Bindings engineBindings = engine.getContext().getBindings(ScriptContext.ENGINE_SCOPE);
engineBindings.forEach((k, v) -> {
if (v instanceof PersistentDataset) {
fr.insee.vtl.model.Dataset ds = ((PersistentDataset) v).getDelegate();
if (ds instanceof SparkDataset) {
Dataset<Row> sparkDs = ((SparkDataset) ds).getSparkDataset();
// Do what you want with sparkDs
}
}
});

Trevas - check_hierarchy

· One min read
Nicolas Laval
Making Sense - Developer

News

Trevas 1.1.0 includes hierarchical validation via operators define hierarchical ruleset and check_hierarchy.

Example

Input

ds1:

idMe
ABC12
A1
B10
C1
DEF100
E99
F1
HIJ100
H99
I0

VTL script

// Ensure ds1 metadata definition is good
ds1 := ds1[calc identifier id := id, Me := cast(Me, integer)];

// Define hierarchical ruleset
define hierarchical ruleset hr (variable rule Me) is
My_Rule : ABC = A + B + C errorcode "ABC is not sum of A,B,C" errorlevel 1;
DEF = D + E + F errorcode "DEF is not sum of D,E,F";
HIJ : HIJ = H + I - J errorcode "HIJ is not H + I - J" errorlevel 10
end hierarchical ruleset;

// Check hierarchy
ds_all := check_hierarchy(ds1, hr rule id all);
ds_all_measures := check_hierarchy(ds1, hr rule id always_null all_measures);
ds_invalid := check_hierarchy(ds1, hr rule id always_zero invalid);

Outputs

  • ds_all
idruleidbool_varerrorcodeerrorlevelimbalance
ABCMy_Ruletruenullnull0
  • ds_always_null_all_measures
idMeruleidbool_varerrorcodeerrorlevelimbalance
ABC12My_Ruletruenullnull0
DEF100hr_2nullnullnullnull
HIJ100HIJnullnullnullnull
  • ds_invalid
idMeruleiderrorcodeerrorlevelimbalance
HIJ100HIJHIJ is not H + I - J101