1
votes

Using ANTLR 3, my lexer has rule

SELECT_ASSIGN:    
'SELECT' WS+ IDENTIFIER WS+ 'ASSIGN' WS+  (('TO'|'USING') WS+)?

using this these match correctly

SELECT VAR1 ASSIGN TO
SELECT VAR1 ASSIGN USING

and this also matches

SELECT VAR1 ASSIGN FOO

However this does not match

SELECT VAR1 ASSIGN TWO

Whereas I have marked TO|USING as optional in the rule.

From generated Java code I see... When lexer notices T of TWO, it goes to match('TO') but since does not find O after T then generates failure.... and returns all the way from the rule -- hence not matching it.

How do I get my lexer rule to match, when input has word with chars starting with suffixed optional part of the rule

Basically I want my rule to match this also (beside what it already matches - as lised at the start):

SELECT VAR1 ASSIGN TWO

Kindly suggest how I approach/resolve this situation.

NOTE:

Such rules are recommended in the parser - But I have this in lexer - because I do not want to parse the entire input by the parser, and want to parse only content of interest. So using such rules in lexer, I locate sections which I really want to parse by the parser.


UPDATE 1 I could circumvent this problem by making 2 rules, like so:

SELECT_ASSIGN_USING_TO
: tok='SELECT' WS+ name=IDENTIFIER WS+ 'ASSIGN' WS+ ('USING'|'TO')

SELECT_ASSIGN
: tok='SELECT' WS+ name=IDENTIFIER WS+ 'ASSIGN'

But is it possible to do the desired in one lexer rule?

2
Please provide a minimal but complete lexer grammar that does this (a minimal reproducible example) - the problem might be in another lexer rule(s).Jiri Tousek

2 Answers

0
votes

An approach to get this in one rule, suggested by my senior - use syntactic predicate

SELECT_ASSIGN
: tok='SELECT' WS+ name=IDENTIFIER WS+ 'ASSIGN'
    (
      (WS+ ('TO'|'USING') WS+)=> (WS+ ('TO'|'USING') WS+)
      | (WS+)
    )
0
votes

Tokens match a complete char sequence or none. It cannot match partially and the grammar rule determines which exactly. You cannot expect a rule for TO to match TWO. If you want TWO to match too you have to add it to your lexer rule.

A few notes here:

  1. The solution your "senior" gave you makes no sense at all. A syntactic predicate is a kinda lookahead to guide the parser in case of ambiquities. There are no ambiquities involved here.
  2. Writing the entire SELECT_ASSIGN rule as a lexer rule is very uncommon and not flexible. A lexer rule should not be used for entire sentences, but only for a small set of characters to find tokens to assign them a type (usually elementary structures of a language like string, number, comment etc.).
  3. ANTLR3 is totally outdated and I wonder why this is still used in your class. ANTLR4 is out since 5 years and should be the choice for any new project.