Spark mode - Parquet source
Metadata
The metadata of the Parquet data sets are inferred from the schema.
Types
Trevas takes care of the conversion between the Parquet types and the types supported by the Trevas engine.
Roles
The VTL roles are added by Trevas to the Parquet schema, by adding a vtlRole
metadata to each field descriptor.
By default, the columns without role in the Parquet schema will have a MEASURE
role in Trevas.
VTL allows to modify roles within scripts (see here)
Read
Dataset<Row> sparkDataset = spark.read().parquet("folder_path");
SparkDataset dataset = new SparkDataset(sparkDataset);
Write
// Trevas Spark Dataset
SparkDataset dataset = ...;
// Spark Dataset
Dataset<Row> sparkDataset = dataset.getSparkDataset();
sparkDataset.write()
.mode(SaveMode.Overwrite)
.parquet("folder_path");