Weird Semantic Predicate Behaviour in ANTLR4, could be a Bug

Question

Here is a working subset of my C parsing grammar. It can only parse the input shown below but is enough to illustrate the problem my full grammar encountered. Note it is following the traditional method to define operator precedence:

grammar CPPProcessor;
translation_unit:    expression;
primary_expression:
  '1'
  //|  {false}? '(' expression ')'
  | 'a'
  | 'b'

;
postfix_expression:
      primary_expression
    | postfix_expression '(' expression ')'

;

unary_expression:
      postfix_expression
    | '-' cast_expression
;
cast_expression:
      unary_expression
    | '(' 'a' ')' cast_expression
;
additive_expression:
      cast_expression
    | additive_expression '-' cast_expression
;
expression :  additive_expression;
WS: [ \t\f]+    -> channel(1);
CRLF: '\r'? '\n' -> channel(1);

Invocation rule is translation_unit and the input is a single line containing this:

(a)-b

Notice the semantic predicate in primary_expression has been commented out. (The way to interpret the grammar is that when the second rule of primary_expression is enabled the input is parsed as a subtraction. When the subrule is not there, it becomes a C-style type cast of -b to type a).

Problem: The real issue is that I suppose having a {false}? is equivalent to having nothing, hence removing the comment should have no difference. However, the parse failed when I removed the comment i.e.

primary_expression:
  '1'
  |  {false}? '(' expression ')'
  | 'a'
  | 'b'

;

and got this error:

line 1:0 no viable alternative at input '('

Why having a {false}? semantic predicate can cause parse failure? Could it be a bug in ANLTR4? It looks like the second subrule in postfix_expression is causing the issue which is left-recursive. When the left-recursion is removed, the issue disappears

JavaMan JavaMan · Accepted Answer · 2016-11-27T11:42:22

I figured out the problem.

Semantic predicates cannot cause the upper rules to back track and try another subrules. So when the second subrule of primary_expression is un-commented, it exposes another '(' matching rule and allows the first subrule of cast_expression to be selected for the input. But once this choice is made, it cannot be undone even if some semantic predicates in further subrule return false. The semantic predicate can only cause some other subrules of the primary_expression to be selected. But since there aren't any other subrules in primary_expression that can match a '('. The parse fails.

Weird Semantic Predicate Behaviour in ANTLR4, could be a Bug

1 Answers