I'm trying to implement syntax highlighting for Java code using this ANTLR grammar. My strategy has been to parse the code into an tree with that grammar, and then use a visitor to go through each terminal in the tree and assign its corresponding text a color. This color is usually just the color associated with the terminal's token, but can be overridden depending on context. For example, consider this screenshot from VSCode:
By default, identifiers are colored white. However, if they are known to refer to classes/methods, then they are colored green. I would like to make a similar distinction in my visitor by labelling identifiers white by default, but overriding that with green for classes/methods.
So far, I have been successful in implementing for this for class/method declarations. The production rule for classDeclaration
looks like this:
classDeclaration
: 'class' Identifier typeParameters?
('extends' typeType)?
('implements' typeList)?
classBody
;
Here, Identifier
is a terminal, while all of the other nonliterals are nonterminals. My strategy was to color every child terminal with an overridable token with green (1). By that last term, it is something I have invented in my codebase to deal with this problem. Essentially, keywords should always have the same color no matter the context, so their tokens are not overridable. Identifiers' color depends on context, so they have a default (white) but you can make them green. The only terminals in the above production are 'class'
, Identifier
, 'extends'
, and 'implements'
. The first and the last two are keywords and not overridable, so following procedure (1) colors only the class name green.
Here is the C# code I used to implement the above strategy.
Unfortunately, this strategy appears to be problematic when attempting to highlight method invocations, such as blah.blah()
above. Here is the production rule for an expression
:
expression
: primary
| expression '.' Identifier
| expression '.' 'this'
| expression '.' 'new' nonWildcardTypeArguments? innerCreator
| expression '.' 'super' superSuffix
| expression '.' explicitGenericInvocation
| expression '[' expression ']'
| expression '(' expressionList? ')'
| // Lots of other stuff
;
This means that foo.bar()
parses as (('foo') '.' 'bar') '(' ')'
. If, for all expression
s, I color all Identifier
children green, then foo.bar()
will have foo
white and bar
green as intended. (Note that foo
is a primary
, and its terminal is not the direct child of an expression
.) However, foo.bar
also has foo
white and bar
green, which does not match the behavior of VSCode above.
I attempted to work around this by creating a new production for expressions that look like expression '.' Identifier '(' expressionList? ')'
and referencing that from expression
.
expression
: // ...
| expression '[' expression ']'
| invocationExpression
| // ...
;
invocationExpression
: expression '.' Identifier '(' expressionList? ')'
| expression '(' expressionList? ')'
;
Then, I would be able to run procedure (1) against invocationExpression
s in my visitor, coloring all child Identifier
s green, which would make foo.bar()
white-green and foo.bar
white-white as intended. However, ANTLR is complaining because expression
and invocationExpression
are mutually left-recursive. How do I overcome this, or is there a different approach to solve this problem?