Modifications of VTL grammar
Usage of the VTL grammar
Trevas on the VTL formal grammar expressed with EBNF. The reference is the version 2.0 upgrade published in July 2020 on the SDMX web site.
The grammar consists of two files ready to be processed by the Antlr parser generator:
-
VtlTokens.g4 contains the list of valid VTL terms.
-
Vtl.g4 contains the rules that produce valid VTL expressions.
Antlr uses these files to produce a lexer that creates a list of vocabulary symbols from an input character stream, and a parser that creates the grammatical structure corresponding to this list of symbols. Antlr can generate parsers usable in different target languages. Trevas uses the Java parser, which is exposed in the vtl-parser
module.
Adaptations of the grammar
In order to improve performance and functionalities, minor modifications were made to the VTL grammar used in Trevas.
Simplification of the grammatical tree
As documented here and here, the expr
and exprComp
branches of the grammatical tree are nearly identical. In order to avoid implementing the same logic twice, the exprComp
branch was commented out in the 498c1f8 commit. It was then noticed that this modification wrongly invalidated the COUNT()
expression, and the corresponding rule was therefore reactivated in the grammar with the [54f86f2] (https://github.com/InseeFr/Trevas/commit/54f86f27d2e8fdd57df1439d74ed56d225064a7d) commit.
Addition of distance operators
Distance operators like Levenshtein of Jaro-Winkler are commonly used in tests of character strings. In order to allow them in VTL expressions, the 036dc60 commit added to the grammar a distanceOperators
section containing a LEVENSHTEIN
rule, as well as the LEVENSHTEIN
symbol in the lexer file.