Skip to main content

Spark mode - Parquet source

Metadata

The metadata of the Parquet data sets are inferred from the schema.

Types

Trevas takes care of the conversion between the Parquet types and the types supported by the Trevas engine.

Roles

The VTL roles are added by Trevas to the Parquet schema, by adding a vtlRole metadata to each field descriptor.

By default, the columns without role in the Parquet schema will have a MEASURE role in Trevas.

VTL allows to modify roles within scripts (see here)

Read

Dataset<Row> sparkDataset = spark.read().parquet("folder_path");
SparkDataset dataset = new SparkDataset(sparkDataset);

Write

// Trevas Spark Dataset
SparkDataset dataset = ...;

// Spark Dataset
Dataset<Row> sparkDataset = dataset.getSparkDataset();

sparkDataset.write()
.mode(SaveMode.Overwrite)
.parquet("folder_path");