Spark mode - SDMX source
vtl-sdmx module exposes the following utilities.
buildStructureFromSDMX3 utility
TrevasSDMXUtils.buildStructureFromSDMX3 allows to obtain a Trevas DataStructure.
Providing corresponding data, you can build a Trevas Dataset.
Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID");
SparkDataset ds = new SparkDataset(
        spark.read()
                .option("header", "true")
                .option("delimiter", ";")
                .option("quote", "\"")
                .csv("path"),
        structure
);
SDMXVTLWorkflow object
The SDMXVTLWorkflow constructor takes 3 arguments:
- a 
ScriptEngine(Trevas or another) - a 
ReadableDataLocationto handle an SDMX message - a map of names / Datasets
 
SparkSession.builder()
                .appName("test")
                .master("local")
                .getOrCreate();
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
This object then allows you to activate the following 3 functions.
SDMXVTLWorkflow run function - Preview mode
The run function can easily be called in a preview mode, without attached data.
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources
// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets()
// will handle SDMX message structures to produce Trevas Datasets
// with metadata defined in this message, and adding empty data
Map<String, Dataset> emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets();
engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets);
Map<String, PersistentDataset> result = sdmxVtlWorkflow.run();
The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets.
SDMXVTLWorkflow run function
Once an SDMXVTLWorkflow is built, it is easy to run the VTL validations and transformations defined in the SDMX file.
Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1");
SparkDataset ds1 = new SparkDataset(
        spark.read()
                .option("header", "true")
                .option("delimiter", ";")
                .option("quote", "\"")
                .csv("path/data.csv"),
        structure
);
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
Map<String, Dataset> inputs = Map.of("ds1", ds1);
ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml");
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs);
Map<String, PersistentDataset> bindings = sdmxVtlWorkflow.run();
As a result, one will receive all the dataset defined as persistent in the TransformationSchemes definition.
SDMXVTLWorkflow getTransformationsVTL function
Gets the VTL code corresponding to the SDMX TransformationSchemes definition.
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
String vtl = sdmxVtlWorkflow.getTransformationsVTL();
SDMXVTLWorkflow getRulesetsVTL function
Gets the VTL code corresponding to the SDMX TransformationSchemes definition.
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
String dprs = sdmxVtlWorkflow.getRulesetsVTL();
Troubleshooting
Hadoop client
The integration of vtl-modules with hadoop-client can cause dependency issues.
It was noted that com.fasterxml.woodstox.woodstox-core is imported by hadoop-client, with an incompatible version for a vtl-sdmx sub-dependency.
A way to fix is to exclude com.fasterxml.woodstox.woodstox-core dependency from hadoop-client and import a newest version in your pom.xml:
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.3.4</version>
    <exclusions>
        <exclusion>
            <groupId>com.fasterxml.woodstox</groupId>
            <artifactId>woodstox-core</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>com.fasterxml.woodstox</groupId>
    <artifactId>woodstox-core</artifactId>
    <version>6.5.1</version>
</dependency>