2
votes

In Antlr4 grammar I need to match with the help of Regular Expressions Latin,Cyrillic,Polish and Greek Letters plus special characters. This is what I have:

STRING: ['][\p{L} 0-9\[\]\^\$\.\|\?\*\+\(\)\\~`\!@#%&\-_+={}""<>:;,\/°]*['];

So I am saying that a String starts and ends with ''. Inside I can have any letter (\p{L}), number and special character except from '. I have tested this on regex101.com and it exactly what I want. But in Antlr4 it is not working. Instead the closest thing I get is:

['][a-zA-Z0-9 \[\]\^\$\.\|\?\*\+\(\)\\~`\!@#%&\-_+={}""<>:;,\/°]*[']

But the Problem is that something like 'Ąłćórżnęł' won't be accepted in my language, but it should be.

Am I doing something wrong in Antlr4 or is that a limitation ? How could I manage to get it to work in Antlr4 ? String is a Lexer Rule.

1

1 Answers

3
votes

\p{L} is not supported by ANTLR. You will have to write these ranges out by hand like this: [\u1234-\u5678] (change \u.... with your hexadecimal Unicode points), where \u1234 is the start of the range and \u5678 the end. Note that you can put more than 1 range in your character set: [\u1234-\u1238\u3456-\u5679].

Thanks, but how about a regular expression in Antlr4 where I allow everything inside of a String except a character like '. But I say that a string start with and end with '

That would look like this:

STRING : '\'' ~[']* '\'';

and with escaped quotes and not allowing line breaks, do this:

STRING : '\'' ( ~['\r\n] | '\\' ['\\] )* '\'';