9
votes

I'm writing an Eclipse/Xtext plugin for CoffeeScript, and I realized I'll probably need to write a lexer for it by hand. CoffeeScript parser also uses a hand-written lexer to handle indentation and other tricks in the grammar.

Xtext generates a class that extends org.eclipse.xtext.parser.antlr.Lexer which in turn extends org.antlr.runtime.Lexer. So I suppose I'll have extend it. I can see two ways to do that

  • Override mTokens(). This is done by the generated code, changing the internal state.
  • Override nextToken() which seems a natural approach, but then I'll have to keep track of the internal state.

I couldn't find any example how to write even a simple lexer for ANTLR without a grammar file. So the easiest answer would be a pointer to one.

An answer to Xtext: grammar for language with significant/semantic whitespace refers to todotext which handles the problem of indentation by changing the tokens in the underlying input stream. I don't want to go that way, because it would be difficult to handle other tricks of the coffeescript grammar.

UPDATE:

I realized in the meantime that my question was partly Xtext specific.

2
You just need to implement ITokenSource - and do whatever you need to do in the nextToken method. Have you checked out stackoverflow.com/questions/4414166/… There are examples on handling indentation (in Python, for instance) in the Definitive Antlr Reference.Jimmy

2 Answers

8
votes

Here is what I did -- and it works.

public class MyLexer extends myprj.parser.antlr.internal.InternalMylangLexer {
  private SomeExternalLexer externalLexer;

  public Lexer(CharStream in) {
    super(in);
    externalLexer = new SomeExternalLexer(in);
  }

  @Override
  public Token nextToken() {
    Token token = null;
    ExternalToken extToken = null;
    try {
      extToken = externalLexer.nextToken();
      if (extToken == null) {
        token = CommonToken.INVALID_TOKEN;
      }
      else {
        token = mapExternalToken(extToken);
      }
    }
    catch (Exception e) {
      token = CommonToken.INVALID_TOKEN;
    }
    return token;
  }

  protected Token mapExternalToken(ExternalToken extToken) {
    // ...
  }
}

Then I have a slightly customized parser containing:

public class BetterParser extends MylangParser {
  @Override
  protected TokenSource createLexer(CharStream stream) {
    MyLexer lexer = new MyLexer(stream);
    return lexer;
  }
}

I also had to change my MylangRuntimeModule.java to contain this method

@Override
public Class<? extends org.eclipse.xtext.parser.IParser> bindIParser() {
     return myprj.parser.BetterParser.class ;
}

And that's it.

6
votes

Another way (without the need to create a custom parser) is to create a custom lexer by extending Xtext's lexer (org.eclipse.xtext.parser.antlr.Lexer) as follows:

public class CustomSTLexer extends Lexer {

    @Override
    public void mTokens() {
      // implement lexer here
    }
}

Then you bind it in your module:

@Override
public void configureRuntimeLexer(Binder binder) {
    binder.bind(Lexer.class)
                .annotatedWith(Names.named(LexerBindings.RUNTIME))
                .to(CustomSTLexer.class);
}

If you want to have a look at a complete example, I have implemented a custom lexer for an Xtext-based editor for StringTemplate called hastee.