Antlr4: Mismatched input

Question

Here's a simple grammar test I thought would be easy to parse, but I get 'mismatched input' right off the bat and I can't figure out what Antlr is looking for.

The input:

  # include "something" program TEST1 { BLAH BLAH }

My grammar:

  grammar ProgHeader;

  program: header* prog EOF ;
  header: '#' ( include | define ) ;
  include: 'include' string ;
  define: 'define' string string? ;
  string: '"' QTEXT '"' ;
  prog: 'program' QTEXT '{' BLOCK '}' ;
  QTEXT: ~[\r\n\"]+ ;
  BLOCK: ~[}]+ ; // don't care, example block
  WS: [ \t\r\n] -> skip ;

The output error message:

line 1:0 mismatched input '# include "something" program TEST1 { BLAH BLAH '
expecting {'program', '#'}

This really confuses me because it says it's looking for a '#' and there's one right at the start of the input. I dumped the parse tree too. It appears to be stuck right at the top, at the 'program' rule:

(program # include "something" program TEST1 { BLAH BLAH  } )

Halp?

Here's the full program driving this test case if it matters (I don't think it should matter, the above info is enough, but here it is):

  package antlrtests;

  import antlrtests.grammars.*;
  import org.antlr.v4.runtime.*;
  import org.antlr.v4.runtime.tree.*;

  /**
   *
   * @author Brenden Towey
   */
  public class ProgHeaderTest {
     private String[] testVectors = {
        "# include \"something\" program TEST1 { BLAH BLAH } ",
     };
     public void runTests() {
        for( String test : testVectors )
           simpleTest( test );
     }
     private void simpleTest( String test ) {
        ANTLRInputStream ains = new ANTLRInputStream( test );
        ProgHeaderLexer wpl = new ProgHeaderLexer( ains );
        CommonTokenStream tokens = new CommonTokenStream( wpl );
        ProgHeaderParser wikiParser = new ProgHeaderParser( tokens );
        ParseTree parseTree = wikiParser.program();
        System.out.println( "'" + test + "': " + parseTree.toStringTree(
                wikiParser ) );
     }
  }

And the full output:

run:
line 1:0 mismatched input '# include "something" program TEST1 { BLAH BLAH ' expecting {'program', '#'}
'# include "something" program TEST1 { BLAH BLAH } ': (program # include "something" program TEST1 { BLAH BLAH  } )
BUILD SUCCESSFUL (total time: 0 seconds)

Gunther Gunther · Accepted Answer · 2013-05-03T18:23:45

The longest token that matches at the very beginning is QTEXT, which matches the text # include (the text up to but not including the first " character), but valid tokens at that point are 'program' and '#', as reported. So better avoid token definitions that match almost anything.

Antlr4: Mismatched input

1 Answers