Writing a custom Xtext/ANTLR lexer without a grammar file

Question

I'm writing an Eclipse/Xtext plugin for CoffeeScript, and I realized I'll probably need to write a lexer for it by hand. CoffeeScript parser also uses a hand-written lexer to handle indentation and other tricks in the grammar.

Xtext generates a class that extends org.eclipse.xtext.parser.antlr.Lexer which in turn extends org.antlr.runtime.Lexer. So I suppose I'll have extend it. I can see two ways to do that

Override mTokens(). This is done by the generated code, changing the internal state.
Override nextToken() which seems a natural approach, but then I'll have to keep track of the internal state.

I couldn't find any example how to write even a simple lexer for ANTLR without a grammar file. So the easiest answer would be a pointer to one.

An answer to Xtext: grammar for language with significant/semantic whitespace refers to todotext which handles the problem of indentation by changing the tokens in the underlying input stream. I don't want to go that way, because it would be difficult to handle other tricks of the coffeescript grammar.

UPDATE:

I realized in the meantime that my question was partly Xtext specific.

You just need to implement ITokenSource - and do whatever you need to do in the nextToken method. Have you checked out stackoverflow.com/questions/4414166/… There are examples on handling indentation (in Python, for instance) in the Definitive Antlr Reference. — Jimmy

Adam Schmideg Adam Schmideg · Accepted Answer · 2011-12-08T21:02:21

Here is what I did -- and it works.

public class MyLexer extends myprj.parser.antlr.internal.InternalMylangLexer {
  private SomeExternalLexer externalLexer;

  public Lexer(CharStream in) {
    super(in);
    externalLexer = new SomeExternalLexer(in);
  }

  @Override
  public Token nextToken() {
    Token token = null;
    ExternalToken extToken = null;
    try {
      extToken = externalLexer.nextToken();
      if (extToken == null) {
        token = CommonToken.INVALID_TOKEN;
      }
      else {
        token = mapExternalToken(extToken);
      }
    }
    catch (Exception e) {
      token = CommonToken.INVALID_TOKEN;
    }
    return token;
  }

  protected Token mapExternalToken(ExternalToken extToken) {
    // ...
  }
}

Then I have a slightly customized parser containing:

public class BetterParser extends MylangParser {
  @Override
  protected TokenSource createLexer(CharStream stream) {
    MyLexer lexer = new MyLexer(stream);
    return lexer;
  }
}

I also had to change my MylangRuntimeModule.java to contain this method

@Override
public Class<? extends org.eclipse.xtext.parser.IParser> bindIParser() {
     return myprj.parser.BetterParser.class ;
}

And that's it.

Writing a custom Xtext/ANTLR lexer without a grammar file

2 Answers