Resolving ANTLR ambiguity while matching specific Types

Question

I'm starting exploring ANTLR and I'm trying to match this format: (test123 A0020 )

Where :

test123 is an Identifier of max 10 characters ( letters and digits )
A : Time indicator ( for Am or Pm ), one letter can be either "A" or "P"
0020 : 4 digit format representing the time.

I tried this grammar :

    IDENTIFIER
:
    ( LETTER | DIGIT ) +
;
    INT
:
    DIGIT+
;
fragment
DIGIT
:
    [0-9]
;

fragment
LETTER
:
    [A-Z]
;

WS : [ \t\r\n(\s)+]+ -> channel(HIDDEN) ;
formatter:  '(' information ')';

information : 
information '/' 'A' INT 
        |IDENTIFIER ;

How can I resolve the ambiguity and get the time format matched as 'A' INT not as IDENTIFIER? Also how can I add checks like length of token to the identifier? I tknow that this doesn't work in ANTLR : IDENTIFIER : (DIGIT | LETTER ) {2,10}

UPDATE:

I changed the rules to have semantic checks but I still have the same ambiguity between the identifier and the Time format. here's the modified rules:

formatter
    : information
    | information '-' time
    ;

time :
    timeMode timeCode;  

timeMode:   
    { getCurrentToken().getText().matches("[A,C]")}? MOD
;

timeCode: {getCurrentToken().getText().matches("[0-9]{4}")}?  INT;

information: {getCurrentToken().getText().length() <= 10 }? IDENTIFIER;

MOD:  'A' | 'C';

So the problem is illustrated in the production tree, A0023 is matched to timeMode and the parser is complaining that the timeCode is missing

Check this question. Although you would have to convert your lexer rules to parser rules. The naive way is to write IDENTIFIER: (LETTER | DIGIT) (LETTER | DIGIT) ... ten times. — Mephy
@BartKiers because I want to include actions in the semantic rules later on without having to treat the 'A0023' as a String.( I will have to do operations if I want to separate the timeMode and timeCode ) I actually have the same problem in another parser for distance unit recognition ( format [M]\d{3} for distance in meter or [F]\d{4} in feets ) — ps_messenger
I'm assuming the following inputs are all identifiers: P123, P12345, P. Correct? — Bart Kiers

Bart Kiers Bart Kiers · Accepted Answer · 2016-03-10T13:35:43

Here is a way to handle it:

grammar Test;

@lexer::members {
  private boolean isAhead(int maxAmountOfCharacters, String pattern) {
    final Interval ahead = new Interval(this._tokenStartCharIndex, this._tokenStartCharIndex + maxAmountOfCharacters - 1);
    return this._input.getText(ahead).matches(pattern);
  }
}

parse
 : formatter EOF
 ;

formatter
 : information ( '-' time )?
 ;

time
 : timeMode timeCode
 ;

timeMode
 : TIME_MODE
 ;

timeCode
 : {getCurrentToken().getType() == IDENTIFIER_OR_INTEGER && getCurrentToken().getText().matches("\\d{4}")}?
   IDENTIFIER_OR_INTEGER
 ;

information
 : {getCurrentToken().getType() == IDENTIFIER_OR_INTEGER && getCurrentToken().getText().matches("\\w*[a-zA-Z]\\w*")}?
   IDENTIFIER_OR_INTEGER
 ;

IDENTIFIER_OR_INTEGER
 : {!isAhead(6, "[AP]\\d{4}(\\D|$)")}? [a-zA-Z0-9]+
 ;

TIME_MODE
 : [AP]
 ;

SPACES
 : [ \t\r\n] -> skip
 ;

A small test class:

public class Main {

    private static void indent(String lispTree) {

        int indentation = -1;

        for (final char c : lispTree.toCharArray()) {
            if (c == '(') {
                indentation++;
                for (int i = 0; i < indentation; i++) {
                    System.out.print(i == 0 ? "\n  " : "  ");
                }
            }
            else if (c == ')') {
                indentation--;
            }
            System.out.print(c);
        }
    }

    public static void main(String[] args) throws Exception {
        TestLexer lexer = new TestLexer(new ANTLRInputStream("1P23 - A0023"));
        TestParser parser = new TestParser(new CommonTokenStream(lexer));
        indent(parser.parse().toStringTree(parser));
    }
}

will print:

(parse 
  (formatter 
    (information 1P23) - 
    (time 
      (timeMode A) 
      (timeCode 0023))) <EOF>)

for the input "1P23 - A0023".

EDIT

ANTLR also can output the parse tree on UI component. If you do this instead:

public class Main {

    public static void main(String[] args) throws Exception {
        TestLexer lexer = new TestLexer(new ANTLRInputStream("1P23 - A0023"));
        TestParser parser = new TestParser(new CommonTokenStream(lexer));
        new TreeViewer(Arrays.asList(TestParser.ruleNames), parser.parse()).open();
    }
}

the following dialog will appear:

Tested with ANTLR version 4.5.2-1

Resolving ANTLR ambiguity while matching specific Types

3 Answers

EDIT