I cannot get JavaCC to properly disambiguate tokens by their place in a grammar. I have the following JJTree file (I'll call it bug.jjt):
options
{
LOOKAHEAD = 3;
CHOICE_AMBIGUITY_CHECK = 2;
OTHER_AMBIGUITY_CHECK = 1;
SANITY_CHECK = true;
FORCE_LA_CHECK = true;
}
PARSER_BEGIN(MyParser)
import java.util.*;
public class MyParser {
public static void main(String[] args) throws ParseException {
MyParser parser = new MyParser(new java.io.StringReader(args[0]));
SimpleNode root = parser.production();
root.dump("");
}
}
PARSER_END(MyParser)
SKIP:
{
" "
}
TOKEN:
{
<STATE: ("state")>
|<PROD_NAME: (["a"-"z"])+ >
}
SimpleNode production():
{}
{
(
<PROD_NAME>
<STATE>
<EOF>
)
{return jjtThis;}
}
Generate the parser code with the following:
java -cp C:\path\to\javacc.jar jjtree bug.jjt
java -cp C:\path\to\javacc.jar javacc bug.jj
Now after compiling this, you can give run MyParser from the command line with a string to parse as the argument. It prints production if successful and spews an error if it fails.
I tried two simple inputs: foo state and state state. The first one parses, but the second one does not, since both state strings are tokenized as <STATE>. As I set LOOKAHEAD to 3, I expected it to use the grammar and see that one string state must be <STATE> and the other must be <PROD_NAME. However, no such luck. I have tried changing the various lookahead parameters to no avail. I am also not able to use tokenizer states (where you define different tokens allowable in different states), as this example is part of a more complicated system that will probably have a lot of these types of ambiguities.
Can anyone tell me how to make JavaCC properly disambiguate these tokens, without using tokenizer states?