
I'm writing an Eclipse/Xtext plugin for CoffeeScript, and I realized I'll probably need to write a lexer for it by hand. CoffeeScript parser also uses a hand-written lexer to handle indentation and other tricks in the grammar.

Xtext generates a class that extends org.eclipse.xtext.parser.antlr.Lexer which in turn extends org.antlr.runtime.Lexer. So I suppose I'll have extend it. I can see two ways to do that

  • Override mTokens(). This is done by the generated code, changing the internal state.
  • Override nextToken() which seems a natural approach, but then I'll have to keep track of the internal state.

I couldn't find any example how to write even a simple lexer for ANTLR without a grammar file. So the easiest answer would be a pointer to one.

An answer to Xtext: grammar for language with significant/semantic whitespace refers to todotext which handles the problem of indentation by changing the tokens in the underlying input stream. I don't want to go that way, because it would be difficult to handle other tricks of the coffeescript grammar.


I realized in the meantime that my question was partly Xtext specific.

You just need to implement ITokenSource - and do whatever you need to do in the nextToken method. Have you checked out stackoverflow.com/questions/4414166/… There are examples on handling indentation (in Python, for instance) in the Definitive Antlr Reference.Jimmy

2 Answers


Here is what I did -- and it works.

public class MyLexer extends myprj.parser.antlr.internal.InternalMylangLexer {
  private SomeExternalLexer externalLexer;

  public Lexer(CharStream in) {
    externalLexer = new SomeExternalLexer(in);

  public Token nextToken() {
    Token token = null;
    ExternalToken extToken = null;
    try {
      extToken = externalLexer.nextToken();
      if (extToken == null) {
        token = CommonToken.INVALID_TOKEN;
      else {
        token = mapExternalToken(extToken);
    catch (Exception e) {
      token = CommonToken.INVALID_TOKEN;
    return token;

  protected Token mapExternalToken(ExternalToken extToken) {
    // ...

Then I have a slightly customized parser containing:

public class BetterParser extends MylangParser {
  protected TokenSource createLexer(CharStream stream) {
    MyLexer lexer = new MyLexer(stream);
    return lexer;

I also had to change my MylangRuntimeModule.java to contain this method

public Class<? extends org.eclipse.xtext.parser.IParser> bindIParser() {
     return myprj.parser.BetterParser.class ;

And that's it.


Another way (without the need to create a custom parser) is to create a custom lexer by extending Xtext's lexer (org.eclipse.xtext.parser.antlr.Lexer) as follows:

public class CustomSTLexer extends Lexer {

    public void mTokens() {
      // implement lexer here

Then you bind it in your module:

public void configureRuntimeLexer(Binder binder) {

If you want to have a look at a complete example, I have implemented a custom lexer for an Xtext-based editor for StringTemplate called hastee.