Skip to main content

Modifications of VTL grammar

Usage of the VTL grammar

Trevas on the VTL formal grammar expressed with EBNF. The reference is the version 2.0 upgrade published in July 2020 on the SDMX web site.

The grammar consists of two files ready to be processed by the Antlr parser generator:

  • VtlTokens.g4 contains the list of valid VTL terms.

  • Vtl.g4 contains the rules that produce valid VTL expressions.

Antlr uses these files to produce a lexer that creates a list of vocabulary symbols from an input character stream, and a parser that creates the grammatical structure corresponding to this list of symbols. Antlr can generate parsers usable in different target languages. Trevas uses the Java parser, which is exposed in the vtl-parser module.

Adaptations of the grammar

In order to improve performance and functionalities, minor modifications were made to the VTL grammar used in Trevas.

Simplification of the grammatical tree

As documented here and here, the expr and exprComp branches of the grammatical tree are nearly identical. In order to avoid implementing the same logic twice, the exprComp branch was commented out in the 498c1f8 commit. It was then noticed that this modification wrongly invalidated the COUNT() expression, and the corresponding rule was therefore reactivated in the grammar with the [54f86f2] (https://github.com/InseeFr/Trevas/commit/54f86f27d2e8fdd57df1439d74ed56d225064a7d) commit.

Addition of distance operators

Distance operators like Levenshtein of Jaro-Winkler are commonly used in tests of character strings. In order to allow them in VTL expressions, the 036dc60 commit added to the grammar a distanceOperators section containing a LEVENSHTEIN rule, as well as the LEVENSHTEIN symbol in the lexer file.