Dear Antlr4 community,
I recently started to use ANTLR4 to translate regular expression from XSD / xml to cvc4. I use the grammar as specified by w3c, see http://www.w3.org/TR/xmlschema11-2/#regexs . For this question I have simplified this grammar (by removing charClass) to:
grammar XSDRegExp;
regExp : branch ( '|' branch )* ;
branch : piece* ;
piece : atom quantifier? ;
quantifier : Quantifiers | '{'quantity'}' ;
quantity : quantRange | quantMin | QuantExact ;
quantRange : QuantExact ',' QuantExact ;
quantMin : QuantExact ',' ;
atom : NormalChar | '(' regExp ')' ; // excluded | charClass ;
QuantExact : [0-9]+ ;
NormalChar : ~[.\\?*+{}()|\[\]] ;
Quantifiers : [?*+] ;
Parsing seems to go fine:
input a(bd){6,7}c{14,15}
However, I get an error message for:
input 12{3,4}
The error is:
line 1:0 mismatched input '12' expecting {, '(', '|', NormalChar}
I understand that the Lexer could also see a QuantExact as the first symbol, but since the Parser is only looking for a NormalChar I did not expect this error.
I tried a number of changes:
[1] Swapping the definitions of QuantExact and NormalChar. But swapping introduces an error in the first input:
line 1:6 no viable alternative at input '6'
since in that case '6' is only seen as a NormalChar and NOT as a QuantExact.
[2] Try to make a context for QuantExact (the curly brackets of quantity), such that the lexer only provides the QuantExact symbols in this limited context. But I failed to find ANTLR4 primitives for this.
So nothing seems to work, therefore my question is: Can I parse this grammar with ANTLR4? And if so, how?
.
in the definition of NormalChar doesn't need to be escaped (I'm not an ANTLR user, and the documentation is a little vague)? Does the string12
parse against the grammar as shown? (From your error message, I conjecture 'no'.) Does the string 'abc' parse? – C. M. Sperberg-McQueen'.'
instead of matching any char. This applies to Java, any .NET language, Perl, JS, Python, etc. Could you tell me why you expect it to need escaping? In which regex implementation does a DOT need to be escaped to only match the literal'.'
? – Bart Kiers