0
votes

Problem

I am using Antrl4 to parse a java like language, where a proprietary query language can be used to write expressions within brackets. Imagine to be inside a Java method, the following line should be allowed:

List<MyObject> items = [SELECT Field1,Field2 FROM MyObject];

The query language should also be allowed in any expressions, so things like

if ([SELECT Field1,Field2 FROM MyObject]!=null) {  }

Should also be allowed. My parser needs to define rules with elements from both modes (when in Java-like mode, when in query language mode).

My approach with an island grammar

I am trying to approach this as an island grammar, as described the Definitive Antlr4 reference, however I cannot get it working.

I am structing my lexer grammar as follows

OPEN_QUERY : '['    -> pushMode(INSIDE_QUERY);

JavaIdentifier: JavaLetter JavaLetterOrDigit* ;
// omitting fragments and loads of other lexer tokens for brevity

mode INSIDE_QUERY;

CLOSE_QUERY : ']' -> popMode ;
SELECT : 'select';
FROM : 'from';
QueryIdentifier: QueryLetter QueryLetterOrDigit*;
// omitting fragments definition for brevity

In my parser parser grammar I am trying to do something like this:

expression: normalExpression | queryExpression;

queryExpression
: '[' SELECT QueryIdentifier FROM  QueryIdentifier']'
;    

But this yields to a token recognition error on the first bracket character.

Is there anything wrong with this approach? Can anyone point me to the mistake that I am making?

1

1 Answers

2
votes

Such problems are easier to diagnose if you dump the token stream to see what the lexer is actually doing. Here, the open bracket is being recognized in the lexer as an OPEN_QUERY and close as CLOSE_QUERY. So the literal brackets, as implicit tokens, i.e., specified in the parser as '[' and ']', are not present in the token stream.

Simple fix by changing to

queryExpression
    : OPEN_QUERY SELECT QueryIdentifier FROM QueryIdentifier CLOSE_QUERY
    ;