0
votes

I'm currently developing a parser for an old proprietary markup-like language which has to be converted to a newer standard. I'm using ANTLR 4.8 to generate C# parsers, which I use with official Antlr4.Runtime.Standard.

The parser grammar starts with the entry rule like this:

parser grammar ParstParser;

options {
  tokenVocab=ParstLexer;
}

report
  :
    input
    lines
    fields
    mod
    head?
    body
    foot?
    EOF
  ;

[...]

Testing the grammar with grun or with the official ANTLR plugin in Rider, it parses my dummy file just fine (sorry for hiding the markup code, but it's a property of the company I'm working for):

ANTLR plugin result

Using C# I wrote a builder for an higher-level model which accepts contexts from the ANTLR parse tree, but the parse fails with an InputMismatchException, which is logged on the console like this:

line 20:0 mismatched input '<EOF>' expecting 'HEAD'

My dummy C# entrypoint is something like the following:

public static class Program
{
    public static void Main()
    {
        var lexer = new ParstLexer(new AntlrInputStream(Examples.ExampleResources.DUMMY));
        var tokens = new CommonTokenStream(lexer);
        var parser = new ParstParser(tokens);
        var parseTree = parser.report();
        var modelBuilder = new ModelBuilder();
        modelBuilder.AddInput(new InputBlock(
            parseTree.input().vars().squote_string().GetText().Trim('\''),
            Examples.ExampleResources.PSTARKIV522));
        modelBuilder.AddLines(parseTree.lines());
        modelBuilder.AddFields(parseTree.fields());
        modelBuilder.AddMod(parseTree.mod());
        modelBuilder.AddHead(parser.head());
        modelBuilder.AddBody(parser.body());
        modelBuilder.AddFoot(parser.foot());
        var model = modelBuilder.GetModel();
        Console.WriteLine(JsonConvert.SerializeObject(model));
    }
}

I can't figure out why I'm experiencing this behavior. I actually searched about this error and I found many people having this kind of error for many different problems; I tried playing around with the EOF token (e.g. not explicitly declaring it, or moving it in a wrapper rule around report), but with no results.

The fact that Java-based tools like grun or the Rider plugin don't complain unlike my C# code does makes me think that the problem can be contestualized in the C# target or in my own Main(), but I can't figure out where.

1
(1) Eliminate any build problems by removing the obj/ and bin/ directories. Do a scan to make sure you are placing the generated .cs files exactly where you expect them to be. (2) After the parse, print out the token stream and parse tree.kaby76
What you should do is to look at the token stream and parse tree. Do they correspond to what you see in the Java version? When you see that two programs differ, you need to see at what point they diverge. You say it is looking for HEAD, but "head?" is optional. That does not sound right. When you have two equivalent programs not acting the same, you have all you need to solve the problem.kaby76
If your supervisor won't let you post enough information for others to properly help you, then yes, I'd say close the question. In your case, recommend your supervisor to hire an external consultant that knows ANTLR and can sign a NDA. Or (if possible and you're allowed) post just enough of the grammar so that others can run it on their machine and see what you see. meta.stackexchange.com/questions/22754/…Bart Kiers
HEAD should be the token it is looking for. But, you are using the default error reporter, which I know does not give the complete set of symbols it expects. Write a custom error listener (see ErrorListener.cs in the C# template) and set a breakpoint when it enters SyntaxError(). Look at what the call stack is. That will give you a clue of what it really is looking for. If you have two equivalent programs (C# and I assume Java) that don't operate the same, you just need to find where they diverge in behavior.kaby76
I thank you both for your help. I will try to debug the way kaby76 suggests, then I will either answer the question myself if I succeed, or act like Bart Kiers suggests and close this question and propose my supervisor to hire someone that can help if I don't solve my problem.NiccoMlt

1 Answers

0
votes

The problem was related to a duplicated parser call: for a mistake in referencing the right object in my C# code, I was calling the parser again instead of the tree generated by the first parser run.

The parser started again from the head parser rule (because that was the one I was requesting with the head() method), but the input was already been consumed by the first invocation of the parser, so the parser was looking for the HEAD token which opens head and it was instead finding the end of the input EOF.