1
votes

The issue we're having with ANTLR is that we have a grammar that's parsing something like this:

Hello, my name is bob.
bob offset: 5

Keep in mind that the "bob." in the first line is dynamic, and could be anything. One of those things is "bob". The "bob offset" line is not dynamic, and is in every file of the type that we are parsing.

So, to parse this, we have a couple of rules:

greeting: 'Hello, my name is' id1=IDENT '.' NEWLINE
    { System.out.println("Name: " + $id1.text"); }
    ;

bob_offset: 'bob offset:' id1=5 NEWLINE
    { System.out.println("bob offset: " + $id1.text); }
    ;

So, the issue is that 'bob offset:' is a token that the lexer reads. Now, when the greeting rule goes, an error is thrown because it's trying to match 'bob' to 'bob offset:', but it can't.

The solution that would be ideal is if ANTLR had some way to specify context- or parser rule-specific lexer rules. This way, the 'bob offset:' token wouldn't be mistaken anywhere else in the grammar.

Any thoughts on this issue would be appreciated.

1
No, that is not possible. Note that creating tokens like 'Hello, my name is' and '??? offset:' (ie. multiple words containing spaces) is not the way to go. If I were you, I'd rethink my approach.Bart Kiers
Any suggestions on how to change our approach? Make lexer rules for each word?boztalay
Hard to say without knowing more. I doubt that you've explained your entire problem in enough detail with this question. Think about what would happen if you'd parse "Hello , my name is ..." (the space before the comma would break it).Bart Kiers
Oh, I see what you're saying. You're correct, I just boiled the main problem down to ask this question. The only time we would be doing something like "bob offset:" as a lexer rule is when "bob offset:" is a structural part of the file we're looking at. It just so happens that the structural parts can overlap with the dynamic parts.boztalay
Sorry, I don't undrstand what your actual question is. I think it's better to explain your real question, post your real grammar, and provide real input you're trying to parse. Not some boiled down version of it.Bart Kiers

1 Answers

0
votes

We ended up having to work around this with more parser rules to flesh it out more specifically for ANTLR.