0
votes

On my assignment, I have this description for the String Lexer:

"String literals consist zero or more characters enclosed by double quotes ("). Use escape sequences (listed below) to represent special characters within a string. It is a compile-time error for a new line or EOF character to appear inside a string literal.

All the supported escape sequences are as follows:

\b backspace

\f formfeed

\r carriage return

\n newline

\t horizontal tab

\" double quote

\ backslash

The following are valid examples of string literals:

"This is a string containing tab \t"

"He asked me: \"Where is John?\""

A string literal has a type of string."

And this is my String lexer:

STRINGLIT: '"'(('\\'('b'|'t'|'n'|'f'|'r'|'\"'|'\\'))|~('\n'))*'"';

Can anybody check for my lexer if it meets the requirement or not? If it's not, please tell me your correction, I don't really understand the requirement and ANTLR4.

1
I don't know ANTLR, so I'm not going to post an actual answer. But remember that regular expressions, generally, are "greedy" and will match as much as they can. This means that if your STRINGLIT is fed the input "this is a test" + "foo bar baz" it will match the entire input, not just the first quoted string. You need to exclude " from characters which can appear within the string (except when escaped).J Earls
@JEarls ty, your answer does help me a lotAlex

1 Answers

0
votes

With ANTLR4, instead of writing \\ ('b' | 't' | 'n'), you can write \\ [btn]. Also, as J Earls mentioned in a comment, you'll want to include the quote in your negated set, as well as the \r and the literal \.

This ought to do the trick:

STRINGLIT
 : '"' ( '\\' [btnfr"'\\] | ~[\r\n\\"] )* '"'
 ;