ANTLR4 Grammar Issue with Decimal Numbers

Question

I'm new to ANTLR and using ANTLR4 (4.7.2 Jar file). I'm currently working on Oracle Parser. I'm having issues with Decimal numbers. I have kept only the relevant parts. My grammar file is as below.

Now when I parse the below statement it is fine. ".1" is a valid number in my case. BEGIN a NUMBER:=.1; END;

I haven't shown the grammar but the below are valid cases for me in Oracle.

a NUMBER:= .1; // with Space after operator
a NUMBER:=1.1; // without Space after operator
a NUMBER:=1; // without Space after operator
a NUMER:= 3; // with Space after operator

Now I need to create a tablespace as below. CREATE TABLESPACE tbs_01 DATAFILE +DATA/BR/CONTROLFILE/Current.260.750;

Here the Digits 260 & 750 are tokenized along with the DOT (as per the definition of NUMERIC_LITERAL). I would want this to be 2 separate digits separated by DOT (and assigned to filenumber and incarnation_number resp as shown in the grammar).

How do I do this? I have tried using _input.LA(-1)!='.'}? etc but was not working correctly for me. I tried many other steps mentioned (most solutions were for ANTLR3 and not working in ANTLR4). Is there a simple way to do this in LEXER? I do not want to write a Parser rule to split the decimal digits.

grammar Oracle;

parse
 : ( sql_statements | error )* EOF
 ;

error
 : UNEXPECTED_CHAR 
 { 
    throw new RuntimeException("UNEXPECTED_CHAR=" + $UNEXPECTED_CHAR.text);
 }
 ;

sql_statements 
: 'CREATE' 'TABLESPACE' tablespace_name 'DATAFILE' fully_qualified_file_name ';'
| 'BEGIN' var1 'NUMBER' ':=' num1 ';' 'END' ';'
;

tablespace_name : IDENTIFIER;
fully_qualified_file_name : K_PLUS_SIGN diskgroup_name K_SOLIDUS db_name K_SOLIDUS file_type K_SOLIDUS file_type_tag '.' filenumber '.' incarnation_number;
diskgroup_name : IDENTIFIER;
db_name : IDENTIFIER;
file_type : IDENTIFIER;
file_type_tag : IDENTIFIER;
filenumber : NUMERIC_LITERAL;
incarnation_number : NUMERIC_LITERAL;

var1 : IDENTIFIER;
num1 : NUMERIC_LITERAL;

IDENTIFIER : [a-zA-Z_] ([a-zA-Z] | '$' | '_' | '#' | DIGIT)* ;
K_PLUS_SIGN : '+';
K_SOLIDUS : '/';
NUMERIC_LITERAL
 : DIGIT+ ( '.' DIGIT+ )? ( E ('+'|'-')? DIGIT+ )? ('D' | 'F')?
 | '.' DIGIT+ ( E ('+'|'-')? DIGIT+ )? ('D' | 'F')?
 ;

SPACES : [ \u000B\t\r\n] -> skip;
WS : [ \t\r\n]+ -> skip;
UNEXPECTED_CHAR : . ;

fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

GRosenberg GRosenberg · Accepted Answer · 2020-09-18T19:36:39

Your Dsl has a natural ambiguity: in some instances, numbers are integers and in others, decimals.

If the Dsl provides sufficient guard conditions, Antlr modes can be used to isolate the instances. For example, in the given Dsl, decimal numbers appear to always occur between := and ; guards.

...
K_ASSIGN : ':=' -> pushMode(Decimals);
K_SEMI : ';' ;
NUMERIC_LITERAL : DIGIT+ ;
...
mode Decimals;
    D_SEMI : ';' -> type(K_SEMI), popMode ;
    NUMERIC: 
        DIGIT+ ( '.' DIGIT+ )? ( E ('+'|'-')? DIGIT+ )? 'D' 
        | 'F')? 
        | '.' DIGIT+ ( E ('+'|'-')? DIGIT+ )? ('D' | 'F')?
     -> type(NUMERIC_LITERAL);

ANTLR4 Grammar Issue with Decimal Numbers

1 Answers