9
votes

I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr.

Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to solve.

I discovered that most of my headaches are caused by the lexer parts of Antlr, rather than the parser. Then I noticed parser grammar X; and realized that perhaps I could have my manually written lexer and then an Antlr generated parser.

So I'm looking for more documentation on this topic. I guess a custom ITokenStream could work, but there appears to be virtually no online documentation on this topic...

1
My recommendation is that you can learn from the existing ones. However, I can only find NHibernate who uses Antlr, and the usage is rather limited. :(Lex Li
"but then there are the small parts that do not and are incredibly painful to solve" - Odd. The lexing part of a language is usually easier to implement. Perhaps you could explain what "small parts" are giving you problems?Bart Kiers
@Bart Kiers I'm having trouble with implementing ranges and a few other features (such as 3.toString() and 3.0.toString())in a different way than that listed in their FAQ. These kind of problems are incredibly simple to solve in a manually created lexer.luiscubal
sorry but I have no idea what "trouble with implementing ranges" and "other features (such as 3.toString() and 3.0.toString())" mean. I also don't know what "their FAQ" is. There is no need to explain yourself if you're not looking for a way to do this in ANTLR, but if you are interested, please edit your original question explain what it is you're not able to do in a ANTLR lexer.Bart Kiers
@Bart Kiers Right now, I found my answer so I'm pretty satisfied. If I do want to explore the possibilities of using an Antlr-generated lexer, then I'll be sure to post a separated question. But, btw, "their FAQ" means "Antlr's FAQ", ranges means stuff like "1..2", and 3.toString()/3.0.toString() means exactly that: Obtaining a field(or, in this case) method of a number without having Antlr die horribly on the multiple possible meanings of '.'.luiscubal

1 Answers

8
votes

I found out how. It might not be the best approach but it certainly seems to be working.

  1. Antlr parsers receive a ITokenStream parameter
  2. Antlr lexers are themselves ITokenSources
  3. ITokenSource is a significantly simpler interface than ITokenStream
  4. The simplest way to convert a ITokenSource to a ITokenStream is to use a CommonSourceStream, which receives a ITokenSource parameter

So now we only need to do 2 things:

  1. Adjust the grammar to be parser-only
  2. Implement ITokenSource

Adjusting the grammar is very simple. Simply remove all lexer declarations and ensure you declare the grammar as parser grammar. A simple example is posted here for convinience:

parser grammar mygrammar;

options
{
    language=CSharp2;
}

@parser::namespace { MyNamespace }

document:   (WORD {Console.WriteLine($WORD.text);} |
        NUMBER {Console.WriteLine($NUMBER.text);})*;

Note that the following file will output class mygrammar instead of class mygrammarParser.

So now we want to implement a "fake" lexer. I personally used the following pseudo-code:

TokenQueue q = new TokenQueue();
//Do normal lexer stuff and output to q
CommonTokenStream cts = new CommonTokenStream(q);
mygrammar g = new mygrammar(cts);
g.document();

Finally, we need to define TokenQueue. TokenQueue is not strictly necessary but I used it for convenience. It should have methods to receive the lexer tokens, and methods to output Antlr tokens. So if not using Antlr native tokens one has to implement a convert-to-Antlr-token method. Also, TokenQueue must implement ITokenSource.

Be aware that it is very important to correctly set the token variables. Initially, I had some problems because I was miscalculating CharPositionInLine. If these variables are incorrectly set, then the parser may fail. Also, the normal channel(not hidden) is 0.

This seems to be working for me so far. I hope others find it useful as well. I'm open to feedback. In particular, if you find a better way to solve this problem, feel free to post a separate reply.