1
votes

I'm new to ANTLR so I hope you guy explains for me explicitly.

I have a /* comment */ (BC) lexer in ANTLR, I want it to be like this:

/* sample */ => BC
/* s
a
m
p
l
e */ => BC
"" => STRING
" " => STRING
"a" => STRING
"hello world \1" => STRING

but I got this:

/* sample */
/* s
a
m
p
l
e */ => BC
""
" "
"a"
"hello world \1" => STRING

it only take the 1st /* and the last */, same with my String token. Here's the code of Comments:

BC: '/*'.*'*/';

And the String:

STRING: '"'(~('"')|(' '|'\b'|'\f'|'r'|'\n'|'\t'|'\"'|'\\'))*'"';
2

2 Answers

3
votes

Lexer rules are greedy by default, meaning they try to consume the longest matching sequence. So they stop at the last closing delimiter.

To make a rule non-greedy, use, well, nongreedy rules:

BC: '/*' .*? '*/';

This will stop at the first closing */ which is exactly what you need.

Same with your STRING. Read about it in The Definitive ANTLR4 Reference, page 285.

1
votes

Also you can use the following code fragment without non-greedy syntax (more general soultion):

MultilineCommentStart:        '/*' -> more, mode(COMMENTS);

mode COMMENTS;

MultilineComment:             '*/' -> mode(DEFAULT_MODE);
MultilineCommentNotAsterisk: ~'*'+ -> more;
MultilineCommentAsterisk:     '*'  -> more;