2
votes

I am new to antlr and am writing an antlr grammar for a DSL. I have skipped white space to handle it . But there is a case where I optionally want my grammar to pick up a particular token with might have white space.The things which I want to accomplish here in a way is

  1. Token SECATTR to have spaces with trimmed trailing and leading spaces.That is something like
    aa aa_aa aa.aa aa_aa aa to be read as single token without leading space when being used in the parsing rule singlerule of sort COUNT(aa aa_aa aa.aa aa_aa aa )>10 . As of now the tree being formed is like The token SECATTR having spaces in lead

  2. Solve the issue of having spaces in my parsing rules such as for singlerule COUNT (aa aa_aa aa.aa aa_aa aa)>10 throws an error due to space after COUNT which is like single rule with spaces

  3. Avoid clumsy parsing (if achievable) of my token SECATTR,since it currently being parsed in the following way for aa aa_aa aa.aa aa_aa aa(And I think it is the root cause of all )Multiple breakdown of SECATTR

I need to handle all of these whitespaces carefully because my other parsing rule expr is complex and based on singlerule and my end goal is to have clean way of allowing tokenization of SECATTR to have whitespace and all other places whitespaces to be ignored . Please suggest me where am I going wrong and what needs to be improved.

grammar Test;

options {
  language = Java;
}


fragment DIVIDE : '/';
fragment PLUS : '+';
fragment MINUS : '-';
fragment STAR : '*';
fragment MOD : '%';
LPAREN : '(';
RPAREN : ')';
fragment COMMA : ',';
fragment COLON : ':';
fragment LANGLEBRACKET : '<';
fragment RANGLEBRACKET : '>';
fragment EQ : '=';
fragment NOT : '!';
fragment UNDERSCORE : '_';
fragment DOT : '.';
fragment GRTRTHANEQTO : RANGLEBRACKET EQ;
fragment LESSTHANEQTO : LANGLEBRACKET EQ;
fragment NOTEQ       : NOT EQ;

WS       :           ('\t'|'\f'|'\n'|'\r'|' ')+{ $channel=HIDDEN; };

fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');


OP1   : ((C O U N T | A V G | C O U N T D I S T I N C T) 
      | C A S T) ;

OP2   : DIVIDE|PLUS|MINUS|STAR|MOD
      |LANGLEBRACKET|RANGLEBRACKET|EQ|GRTRTHANEQTO|LESSTHANEQTO|NOTEQ
      |E Q U A L S | L I K E | N O T E Q U A L S | N O T L I K E | N O T N U L L;

OP3   : ((C O R R E S P O N D I N G | A N Y)|I);
OP4   : (A N D | O R);

DIGIT    :    ('0'..'9')+;
fragment Letter   : ('a'..'z' | 'A'..'Z')+;
fragment Space    : ' '+;
SECATTR  :Letter (Letter|UNDERSCORE|DOT|Space)+
         ;


singlerule    : SECATTR  OP2 (DIGIT|Letter)
              | OP1 LPAREN  SECATTR RPAREN OP2  (DIGIT|Letter)
              | SECATTR OP2 SECATTR
              | OP1 LPAREN SECATTR RPAREN  OP2 OP1 LPAREN SECATTR RPAREN
;
expr    :((LPAREN? singlerule RPAREN?) OP4?)+
        |((LPAREN (LPAREN singlerule RPAREN) OP4 (LPAREN singlerule RPAREN) RPAREN)+ (OP4 (LPAREN? singlerule RPAREN?))+ OP4?)+
        | (LPAREN (LPAREN singlerule RPAREN) OP4 (LPAREN singlerule RPAREN) RPAREN OP3)+;
1

1 Answers

0
votes

ANTLR 3 and 4 have a feature called token channels. You can send your whitespace / comments etc... to the parser on a different channel. They are in the token stream for use but not parsed by the parser.