ANTLR mismatched input error

Question

I'm writing a parser for my own language. I'm trying to parse the phrase

Number a is 10;

which is basically equivalent to int a = 10;.

It should match the variable_def rule. When I run it, I get the error

line 1:0 extraneous input 'Number' expecting {<EOF>, 'while', ';', 'if', 'function', TYPE, 'global', 'room', ID}
line 1:9 mismatched input 'is' expecting '('

This is my grammar:

grammar Script;

@header {
package script;
}

// PARSER

program
:
    block EOF
;

block
:
    (
        statement
        | functionDecl
    )*
;

statement
:
    (variable_def
    | functionCall
    | ifStatement
    | forStatement
    | whileStatement) ';'
;

whileStatement
:
    'while' '(' expression ')' '{' (statement)* '}'
;

forStatement
:
;

ifStatement
:
    'if' '(' expression ')' '{' statement* '}'
    (
        (
            'else' '{' statement* '}'
        )
        |
        (
            'else' ifStatement
        )
    )?
;

functionDecl
:
    'function' ID
    (
        '('
        (
            TYPE ID
        )?
        (
            ',' TYPE ID
        )* ')'
    )?
    (
        'returns' RETURN_TYPE
    )? '{' statement* '}'
;

functionCall
:
    ID '(' exprList? ')'
;

exprList
:
    expression
    (
        ',' expression
    )*
;

variable_def
:

    TYPE assignment
    | GLOBAL variable_def
    | ROOM variable_def
;

expression
:
    '-' expression # unaryMinusExpression
    | '!' expression # notExpression
    | expression '^' expression # powerExpression
    | expression '*' expression # multiplyExpression
    | expression '/' expression # divideExpression
    | expression '%' expression # modulusExpression
    | expression '+' expression # addExpression
    | expression '-' expression # subtractExpression
    | expression '>=' expression # gtEqExpression
    | expression '<=' expression # ltEqExpression
    | expression '>' expression # gtExpression
    | expression '<' expression # ltExpression
    | expression '==' expression # eqExpression
    | expression '!=' expression # notEqExpression
    | expression '&&' expression # andExpression
    | expression '||' expression # orExpression
    | expression IN expression # inExpression
    | NUMBER # numberExpression
    | BOOLEAN # boolExpression
    | functionCall # functionCallExpression
    | '(' expression ')' # expressionExpression
;

assignment
:
    ID ASSIGN expression
;

// LEXER

RETURN_TYPE
:
    TYPE
    | 'Nothing'
;

TYPE
:
    'Number'
    | 'String'
    | 'Anything'
    | 'Boolean'
    | 'Growable'? 'List' 'of' TYPE
;

GLOBAL
:
    'global'
;

ROOM
:
    'room'
;

ASSIGN
:
    'is'
    (
        'a'
        | 'an'
        | 'the'
    )?
;

EQUAL
:
    'is'?
    (
        'equal'
        (
            's'
            | 'to'
        )?
        | 'equivalent' 'to'?
        | 'the'? 'same' 'as'?
    )
;

IN
:
    'in'
;

BOOLEAN
:
    'true'
    | 'false'
;

NUMBER
:
    '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5

    | '-'? '.' INT EXP? // -.35, .35e5

    | '-'? INT EXP // 1e10 -3e4

    | '-'? INT // -3, 45

;

fragment
EXP
:
    [Ee] [+\-]? INT
;

fragment
INT
:
    '0'
    | [1-9] [0-9]*
;

STRING
:
    '"'
    (
        ' ' .. '~'
    )* '"'
;

ID
:
    (
        'a' .. 'z'
        | 'A' .. 'Z'
        | '_'
    )
    (
        'a' .. 'z'
        | 'A' .. 'Z'
        | '0' .. '9'
        | '_'
    )*
;

fragment
JAVADOC_COMMENT
:
    '/*' .*? '*/'
;

fragment
LINE_COMMENT
:
    (
        '//'
        | '#'
    ) ~( '\r' | '\n' )*
;

COMMENT
:
    (
        LINE_COMMENT
        | JAVADOC_COMMENT
    ) -> skip
;

WS
:
    [ \t\n\r]+ -> skip
;

How can I fix this error?

user3159253 user3159253 · Accepted Answer · 2014-07-22T01:38:34

The particular error occurs because in the lexer part of the grammar TYPE term clashes with RETURN_TYPE lexer term. There're other mistakes as well, but the problem showcase may be stripped down to just following:

grammar Script;

program
:
    block EOF
;

block
:
    (
       statement
     | functionDecl
    )*
;

statement
:
    (
      variable_def
    ) ';'
;

functionDecl
:
    'function' ID
    (
      'returns' RETURN_TYPE
    )?
    '{' statement* '}'
;

variable_def
:
    TYPE assignment
;

expression
:
    NUMBER # numberExpression
;

assignment
:
    ID ASSIGN expression
;

RETURN_TYPE
:
    TYPE
    | 'Nothing'
;

TYPE
:
    'Number'
;

ASSIGN
:
    'is'
    (
        'a'
      | 'an'
      | 'the'
    )?
;

NUMBER
:
    '-'? INT // -3, 45
;

fragment
INT
:
    '0'
  | [1-9] [0-9]*
;

ID
:
     (
        'a' .. 'z'
      | 'A' .. 'Z'
      | '_'
     )
     (
        'a' .. 'z'
      | 'A' .. 'Z'
      | '0' .. '9'
      | '_'
    )*
;

WS
:
    [ \t\n\r]+ -> skip
;

if RETURN_TYPE is converted into a parser rule, e.g returnType, then everything goes Ok (for this particular test, as I said your grammar contains other mistakes like this one). This demonstrates the basic princple regarding Antlr (and all other parser generators with lexer and parser separated) behaviour: the lexer is always works in its own context, it can't determine if a particular sequence of symbols is one term or another if both terms share the same sequence of characters. So you have two options: introduce lexer contexts (called modes) or leave on the lexer level only basic and unambiguous entities, and move everything else to parser.

ANTLR mismatched input error

2 Answers