I am trying to define lexer rules for PostgreSQL SQL.
The problem is with the operator definition and the line comments conflicting with each other.
for example @---
is an operator token @-
followed by the --
comment and not an operator token @---
In grako
it would be possible to define the negative lookahead for the -
fragment like:
OP_MINUS: '-' ! ( '-' ) .
In ANTLR4 I could not find any way to rollback already consumed fragment.
Any ideas?
Here the original definition what the PostgreSQL operator can be:
The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:
+ - * / < > = ~ ! @ # % ^ & | ` ?
There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.
A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:
~ ! @ # % ^ & | ` ?
For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.
+
,-
,*
and/
in any combination. But--
and/*
start the comment, and the lexer should be able to return+--this_is_plus
as two tokens:Op(+)
andLineComment(--this_is_plus)
and not asOp(+--)
andIdent(this_is_plus)
– valgog@
followed by exactly 1 character? – Onur