I have a SQL grammar which do not have support for host variables. I want to provide support for host variables in that but a situation that I have encountered is tricky.
There is SQL Identifier lexer rule in grammar, which supports alphanumeric along with some special characters '@', '#', '$ and '_'.
Host variable's are dependent on language in which SQL is embedded. e.g. COBOL. COBOL identifiers allow additionally hyphen(-) in variable names (Some other differences in them).
So I added LANG_ID additionally in grammar, which is matching COBOL identifiers.
Parser rule for host variable was like :
hostvariable : COLON WS* (ID | LANG_ID);
There was some other rule which was matching expressions. something like that:
expression : identifier ('+' | '-') identifier;
identifier here is parser rule which may have ID and some other tokens.
valid input for expressions is ABC+ABC, ABC-ABC, ABC-12.
Now when input is like ABC-12, then it is getting tokenized as COBOL identifier.
Solution which I implemented was that, if its followed by COLON (:) , then only its COBOL identifier. Then I made LANG_ID fragment and created lexer rule as :
HOSTVAR : COLON WS* LANG_ID;
Now problem occurred with JSON_OBJECT SQL function which have syntax like:
JSON_OBJECT('KEY' : VALUE)
There is a rule for JSON_OBJECT which expects something like (character_literal COLON value_expression)
value_expression could also be identifier.
In this case ': VALUE' is tokenized as HOSTVAR and JSON_OBJECT rule is failing.
I am not able to find any solution for the problem. In every approach something getting failed.
:ABC-12
is a COBOL identifier orHOSTVAR
MINUS
NUMBER
? This is an issue with how you define your language, and you need to resolve it before trying to find a way to write a grammar for it. As for conflict with JSON_OBJECT, this is something that probably can't be resolved in lexer and needs to be deferred to parser, same as in stackoverflow.com/q/51378998/5375403. – Jiri Tousek