I am new to antlr and am writing an antlr grammar for a DSL. I have skipped white space to handle it . But there is a case where I optionally want my grammar to pick up a particular token with might have white space.The things which I want to accomplish here in a way is
Token SECATTR to have spaces with trimmed trailing and leading spaces.That is something like
aa aa_aa aa.aa aa_aa aa to be read as single token without leading space when being used in the parsing rule singlerule of sortCOUNT(aa aa_aa aa.aa aa_aa aa )>10
. As of now the tree being formed is likeSolve the issue of having spaces in my parsing rules such as for singlerule
COUNT (aa aa_aa aa.aa aa_aa aa)>10
throws an error due to space after COUNT which is likeAvoid clumsy parsing (if achievable) of my token SECATTR,since it currently being parsed in the following way for aa aa_aa aa.aa aa_aa aa(And I think it is the root cause of all )
I need to handle all of these whitespaces carefully because my other parsing rule expr is complex and based on singlerule and my end goal is to have clean way of allowing tokenization of SECATTR to have whitespace and all other places whitespaces to be ignored . Please suggest me where am I going wrong and what needs to be improved.
grammar Test;
options {
language = Java;
}
fragment DIVIDE : '/';
fragment PLUS : '+';
fragment MINUS : '-';
fragment STAR : '*';
fragment MOD : '%';
LPAREN : '(';
RPAREN : ')';
fragment COMMA : ',';
fragment COLON : ':';
fragment LANGLEBRACKET : '<';
fragment RANGLEBRACKET : '>';
fragment EQ : '=';
fragment NOT : '!';
fragment UNDERSCORE : '_';
fragment DOT : '.';
fragment GRTRTHANEQTO : RANGLEBRACKET EQ;
fragment LESSTHANEQTO : LANGLEBRACKET EQ;
fragment NOTEQ : NOT EQ;
WS : ('\t'|'\f'|'\n'|'\r'|' ')+{ $channel=HIDDEN; };
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
OP1 : ((C O U N T | A V G | C O U N T D I S T I N C T)
| C A S T) ;
OP2 : DIVIDE|PLUS|MINUS|STAR|MOD
|LANGLEBRACKET|RANGLEBRACKET|EQ|GRTRTHANEQTO|LESSTHANEQTO|NOTEQ
|E Q U A L S | L I K E | N O T E Q U A L S | N O T L I K E | N O T N U L L;
OP3 : ((C O R R E S P O N D I N G | A N Y)|I);
OP4 : (A N D | O R);
DIGIT : ('0'..'9')+;
fragment Letter : ('a'..'z' | 'A'..'Z')+;
fragment Space : ' '+;
SECATTR :Letter (Letter|UNDERSCORE|DOT|Space)+
;
singlerule : SECATTR OP2 (DIGIT|Letter)
| OP1 LPAREN SECATTR RPAREN OP2 (DIGIT|Letter)
| SECATTR OP2 SECATTR
| OP1 LPAREN SECATTR RPAREN OP2 OP1 LPAREN SECATTR RPAREN
;
expr :((LPAREN? singlerule RPAREN?) OP4?)+
|((LPAREN (LPAREN singlerule RPAREN) OP4 (LPAREN singlerule RPAREN) RPAREN)+ (OP4 (LPAREN? singlerule RPAREN?))+ OP4?)+
| (LPAREN (LPAREN singlerule RPAREN) OP4 (LPAREN singlerule RPAREN) RPAREN OP3)+;