1
votes

I'm using ANTLR and the following grammar:

grammar QuickBasic;

options 
{
    language = 'CSharp2';
    output = AST;
}

parse
    :    block EOF
    ;

block
    :    (labelStatement | labeledStatement | statement)*
    ;

labelStatement
    :    label ':' -> ^(label)
    ;

labeledStatement
    :    label statement -> ^(label statement)
    ;

statement
    :    assignment
    ;

assignment
    :    IDENTIFIER '=' value -> ^('=' IDENTIFIER value)
    ;

value
    :    (IDENTIFIER | constant)
    ;

constant 
    :    (STRING | INTEGER | REAL)
    ;

label
    :    (ALPHANUMERIC)+
    ;

IDENTIFIER
    :    LETTER (ALPHANUMERIC)*
    ;

REAL
    :    (INTEGER '.' NATURAL)
    ;

INTEGER
    :    ('-')? NATURAL
    ; 

SPACE
    :    (' ' | '\t' | '\r' | '\n' | '\u000C') {Skip();} 
    ;

STRING
    :    '"' ('""' | ~'"')* '"'
    ;

fragment NATURAL
    :    (DIGIT)+
    ;   

fragment ALPHANUMERIC
    :    (DIGIT | LETTER)
    ;

fragment DIGIT
    :    '0'..'9'
    ;

fragment LETTER
    :    ('a'..'z' | 'A'..'Z')
    ;

With this, I'm trying to parse the folowing file:

PI = 3.141592
CALC:
100 A = 1

What happens next is that line 'CALC:' should be a label, but it tries to parse as a statement, and gives me an error saying mismatched input ':' expecting '='.

2

2 Answers

2
votes

Your label rule is wrong:

label
    :    (ALPHANUMERIC)+
    ;

because ALPHANUMERIC is a fragment-lexer rule, it can only be used by other lexer rules, never in parser rules. Your lexer will only produce the following tokens: IDENTIFIER, INTEGER, REAL and STRING (plus the literal tokens in your parser rules, like '.' etc.): making those the only lexer rules you can use in your parser rules.

Also, you should only create AST's that have a single unique root. You're trying to create a root for both labelStatement and labeledStatement that wouldn't make it distinguishable from other parser rules: making the tree walker (either ANTLR's tree walker, or your own tree walker) having problems when they encounter the root of such an AST. Much better to create (imaginary) LABEL and LABELED_STAT tokens an make them the root of your AST:

...

tokens 
{
    LABEL;
    LABELED_STAT;
}

parse
    :    block EOF
    ;

...

labelStatement
    :    label ':' -> ^(LABEL label)
    ;

labeledStatement
    :    label statement -> ^(LABELED_STAT label statement)
    ;

...

label
    :    IDENTIFIER
    |    INTEGER
    ;

This will create the following AST:

enter image description here

0
votes

try using lower case skip() instead of Skip() and something like this to allow multiple spaces

SPACE
    :    (' ' | '\t' | '\u000C' | '\n' | '\r' )+ {skip();} 
    ;