I have a parser based on ANTLR 4 and using listeners, not visitors. It already recognizes and stores the declaration of functions, variables and so on.
I'm trying to resolve some grammar ambiguities with semantic predicates, for instance to separate a function call from an array/vector access when parsing VHDL source code. This is important in order to avoid further complications in the full grammar.
In the following example:
3 + j * f(i)
f(i)
could be either a function f
with parameter i
or an array f
accessed by index i
. The following simplified example below shows how the predicates could help resolve that ambiguity:
expression:
expression OPERATOR expression | simple_expression;
simple_expression:
function_expression | array_expression | ID | NUMBER;
function_expression:
{is_function()}? ID '(' expression_list ')';
array_expression:
{is_array()}? ID '(' expression ')';
expression_list:
expression ( ',' expression )*;
The listeners parse the declarations and store function and array identifiers in a database, which allows to know whether identifier ID
is a function, an array or undeclared (I'm not showing any example of grammar for those declarations here, to keep it simple).
An example of predicate would be, at the top of the grammar file:
@parser::members {
Definitions defs;
boolean is_function() {
return defs.isFunction(getCurrentToken().getText());
}
boolean is_array() {
return defs.isArray(getCurrentToken().getText());
}
}
However I cannot use that information in the predicates because they are called too early, before the declaration's listeners are called to build the ID database. If I put a System.out.print
in those functions, and also in the listeners, I see that
- the expression predicates are first called on the entire file being parsed,
- and only then are all the declaration listeners called, even though the declarations are before these expressions in the file.
I'm aware the parser is looking ahead, but is there a way to expedite the declaration listeners as soon as possible, in order to have their information ready for the predicates related to expressions in the rest of the file?
Or is that the wrong way to use the predicates? I would like to avoid source code in the grammar as much as possible, like a work-around that stores preliminary information during the parsing of declarations with code embedded in the grammar file. And a 2-pass parser seems a bit awkward.