0
votes

I have a lexer rule that defines single-quoted literal string as

L_S_STRING  : '\'' (('\'' '\'') | ('\\' '\'') | ~('\''))* '\''

It fails one particular case:

'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\''

The problem is really with the last two single quotes. If I added a space in between, it worked. Or I could use two single quotes to end and it worked too, e.g.

'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z'''

I am not sure if it has something to do with having a non-greedy operator which caused the first-match of ('\'' '\'')? If so, I don't see how the last version could have worked.

In any event, could someone help please?

UPDATE - I am not able to reproduce it outside of the full grammar. This may be a red herring.

UPDATE - I missed some important context so I posted another question here Antlr4: single quote rule fails when there are escape chars plus carriage return, new line

1
Can you please tell more about your syntax? How the characters are escaped and the meaning of two single quotes, which strings are valid and which are not.trollingchar
Please add a MCVE that demonstrates what you describe: stackoverflow.com/help/mcveBart Kiers

1 Answers

0
votes

I can't reproduce that. Given the following grammar:

lexer grammar Test;

L_S_STRING  : '\'' (('\'' '\'') | ('\\' '\'') | ~('\''))* '\'';
OTHER       : . ;

which can be tested as follows:

String source = "A'yyyy-MM-dd\\\\'T\\\\'HH:mm:ss\\\\'Z\\\\''B";

Test lexer = new Test(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();

for (Token t : tokens.getTokens()) {
    System.out.printf("%-15s %s\n", Test.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}

will print:

OTHER           A
L_S_STRING      'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\''
OTHER           B
EOF             <EOF>