Antlr generated Java doesn't match Antlr IDE

Question

I have a grammar that accepts key / value pairs that appear one per line. The values may be multi-line.

The Eclipse plug-in ANTLR IDE works correctly and accepts a valid test string. However, the generated Java does not accept the same string.

Here is the grammar:

message: block4 ;

block4:  STARTBLOCK '4' COLON expr4+ ENDBLOCK ;

expr4:   NEWLINE (COLON key COLON expr | '-')+;

key:     FIELDVALUE* ; 

expr:    FIELDVALUE* ; 

NEWLINE    : ('\n'|'\r') ;
FIELDVALUE : (~('-'|COLON|ENDBLOCK|STARTBLOCK))+; 
COLON      : ':' ;
STARTBLOCK : '{' ;
ENDBLOCK   : '}' ;

ANTLR IDE parses this correctly: SwiftTiny parse tree

Don't squint... It is dividing up key/expression pairs whether they are single-line values (like 23B / CRED) or multiline values (like 59 / /13212312\r\nRECEIVER NAME S.A\r\n).

Here is the input string:

{4:
:20:007505327853
:23B:CRED
:32A:050902JPY3520000,
:33B:JPY3520000,
:50K:EUROXXXEI
:52A:FEBXXXM1
:53A:MHCXXXJT
:54A:FOOBICXX
:59:/13212312
RECEIVER NAME S.A
:70:FUTURES
:71A:SHA
:71F:EUR12,00
:71F:EUR2,34
-}

When Eclipse runs anltr-3.4-complete.jar on the grammar, it generates SwiftTinyLexer.java and SwiftTinyParser.java. The lexer lexes them into 35 tokens, starting with:

STARTBLOCK
4
COLON
FIELDVALUE
COLON

I would like token 4 to be an expr4 rather than a FIELDVALUE (and the IDE seems to agree with me). But since it is a FIELDVALUE, the parser is choking on that token with line 1:3 required (...)+ loop did not match anything at input '\r\n'.

Why is there a difference between the way that anltr 3.4 and ANTLR IDE 2.1.2.201108281759 lex the same string?

Is there a way to fix the grammar so that it matches expr4 before it matches FIELDVALUE?

Have you checked to see if the ANTLR IDE uses ANTLR 3.4? My guess is it is using a different version. — user177800
I went to Preferences -> ANTLR -> Builder and it is using the ANTLR Parser Generator Version 3.4. — rajah9
OK, I have a working hypothesis. The IDE input is taking a single \n while the Java test code is generating a \r\n. Adding a "1 or more" to NEWLINE with ('\n'|'\r')+ made the parse go forward without the lexical error. — rajah9
@rajha9 if you figure it out please post the solution as an Answer and then come back and select it with the checkmark for future users to be able to find it! — user177800

rajah9 rajah9 · Accepted Answer · 2012-08-13T17:21:59

The IDE input string has a single \n while the Java test code is getting a Windows-style \r\n.

I changed NEWLINE by adding a "1 or more," that is from

NEWLINE    : ('\n'|'\r') ;

to

NEWLINE    : ('\n'|'\r')+ ;

This allowed the parse go forward without the lexical error, and now it makes sense why the IDE behaved differently from generated Java: They were getting slightly different input strings.

Antlr generated Java doesn't match Antlr IDE

1 Answers