Stripping actions from ANTLR grammar changes its parsing algorithm

Question

I have a grammar Foo.xtext (too complex to include it here). Xtext generates InternalFoo.g from it. After some tweaking it also generates DebugInternalFoo.g which claims to be the same thing without actions. Now, I strip off actions with ANTLR directly

java -cp antlr-3.4.jar org.antlr.tool.Strip Internal.g > Stripped.g

I'd expect the three grammars to behave the same way when I check them. But here is what I experienced

InternalFoo.g - error, rule assignment has non-LL(*) decision
DebugInternalFoo.g - no problem, parses fine
Stripped.g - warnings at rule assignment, decision can match using multiple alternatives. It fails to parse properly.

Is it possible that a grammar parses a text differently with or without actions? Or is it a bug in any of the action-remover tools? (The rule in question has syntactic predicates, and without them, it would really have a non-LL(*) decision.)

UPDATE:

I partly found what caused the problem. The rule in question was like this

trickyRule:
  ({ some complex action})
  (expression '=')=>...

Stripping with Antlr removed the action, but left an empty group there:

// Stripped.g
trickyRule:
  () (expression '=')=>...

The generation of the debug grammar removes both the action, and the now empty group around it:

// DebugInternalFoo.g
trickyRule:
  (expression '=')=>...

So the lesson learned is: an empty group before a syntactic predicate is not the same as nothing at all.

Bart Kiers Bart Kiers · Accepted Answer · 2011-12-09T08:18:54

Is it possible that a grammar parses a text differently with or without actions?

Yes, that is possible. org.antlr.tool.Strip leaves syntactic predicates¹, but removes validating²- and gated³ semantic predicates (and member sections that these semantic predicates might use).

For example, the following rules would only match an A_TOKEN:

parser_rule1
  :  (parser_rule2)=> parser_rule2
  ;

parser_rule2
  :  {input.LT(1).getType() == A_TOKEN}? .
  ;

but if you use the Strip tool on it, it leaves the following:

parser_rule1
  :  (parser_rule2)=> parser_rule2
  ;

parser_rule2
  :  /*{input.LT(1).getType() == A_TOKEN}?*/ .
  ;

making it match any token.

In other words, Strip could change the behavior of the generated lexer or parser.

¹ syntactic predicate: ( ... )=>
² validating semantic predicate { ... }?
³ gated semantic predicate { ... }?=>

Stripping actions from ANTLR grammar changes its parsing algorithm

1 Answers