1
votes

Can xtext lexer emit whatever it can't recognize as a special token? Like

terminal USE: 'use';
terminal SELECT: 'select';
terminal OTHER_KEYWORDS: /* not 'use' nor 'select' */;

I write grammar like

terminal fragment A: 'a' | 'A';
    ...
terminal fragment Z: 'z' | 'Z';

terminal fragment LETTER: 'a'..'z' | 'A'..'Z';

terminal fragment A_: 'b'..'z' | 'B'..'Z';
      ...
terminal fragment Z_: 'a'..'y' | 'A'..'Y';

terminal fragment SU_: 'a'..'r' | 't' | 'v'..'z' | 'A'..'R' | 'T' | 'V'..'Z';

terminal OTHER_KEYWORDS:
  SU_ LETTER* |

  U S_ LETTER* |
  U S E_ LETTER* |

  S E_ LETTER* |
  S E L_ LETTER* |
  S E L E_ LETTER* |
  S E L E C_ LETTER* |
  S E L E C T_ LETTER*
;

The reason I want to do this is because antlr will failed on that kind of typo and failed for all the parsing after that. If there is another could avoid failed for parsing then I don't need to use this error prone and looks stupid way to solve that.

1

1 Answers

1
votes

I found out simply using ID to consume the other garbage in input stream would work.

terminal USE: 'use';
terminal SELECT: 'select';
       ...
terminal TYPO: ID;

So if I have us e, us will be parsed as an ID; if I have use, use will be parsed as a USE. The order of terminal tokens is important.