3
votes

I am trying to understand the grammar file: https://github.com/antlr/grammars-v4/blob/master/url/url.g4

STRING
   : ([a-zA-Z~] |HEX) ([a-zA-Z0-9.-] | HEX)*
   ;
HEX
    : ('%' [a-fA-F0-9] [a-fA-F0-9])+
    ;

I am not able to understand the ~ operator in the end the Character set in: [a-zA-Z~] I know that ~ stands for not in a set operator, as per: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md i.e ~x is Match any single character not in the set described by x But how to interpret when it comes to the end as in the pattern for STRING above?

1

1 Answers

3
votes

Not a specialist on Antlr in any way, but I would assume that it's just a literal tilde character (~) since that can appear in URLs. This is used e.g. for specifying URLs to user's home directories but is much less common nowadays, at least on the Internet.

If you look at the production rules, a tiled as hostname for example would specify a URL relative to the user's home.