Follow-up question to : Getting plain text in antlr instead of tokens
1.I used a rule
COMMENT : START_1_TAG START_COMMENT END_1_TAG .*? START_2_TAG END_COMMENT END_2_TAG -> skip;
to skip any comments using my lexer. But, I get a mismatched input , when I give any space inside the tags.
The part of my relevant part of my Lexer is:
lexer grammar DemoLexer;
START_1_TAG : '<%' -> pushMode(IN_TAG);
START_2_TAG : '<<' -> pushMode(IN_TAG);
COMMENT : START_1_TAG START_COMMENT END_1_TAG .*? START_2_TAG END_COMMENT END_2_TAG -> skip;
TEXT : ( ~[<] | '<' ~[<%] )+;
mode IN_TAG;
START_COMMENT : 'startcomment' ;
END_COMMENT : 'endcomment' ;
ID : [A-Za-z_][A-Za-z0-9_]*;
INT_NUMBER : [0-9]+;
END_1_TAG : '%>' -> popMode;
END_2_TAG : '>>' -> popMode;
SPACE : [ \t\r\n] -> channel(HIDDEN);
My issue is, <%comment%>hi<%endcomment%>
gets parsed correctly. But, while I give my input as, <% comment %>
or <% endcomment %>
, with spaces in between the tags, it is not recognized by the COMMENT rule.
It gets recognized by the COMMENT rule, when I define the rule as:
COMMENT : START_1_TAG SPACE*? 'commentstart' SPACE*? END_1_TAG .*? START_1_TAG SPACE*? 'commentend' SPACE*? END_1_TAG -> skip;
with explicit spaces.
Is this the proper method to handle this?
2.I have a rule where I need the raw content inside a tag pair. Eg:
Here, the tokens need to be <%startraw%>,<%Hi%> and <%endraw%>
I tried using the text rule, but it dosen't work because it dosen't include '<%' and '<<'.
I tried:
in my parser,
rawText : RAW_TAG_START RAW_TEXT RAW_TAG_END ;
in my lexer,
RAW_TAG_START : '<%' 'startraw' '%>' -> pushMode(RAW_MODE);
RAW_TAG_END : '<%' 'endraw' '%>' -> popMode;
mode RAW_MODE;
RAW_TEXT : .*? ;
For some reason, when I try to parse this with the intellij antlr plugin, it seems to freeze and crash whenever I try to match the rawText rule.