I'm working on a parser, using GNU bison, and I'm facing an interesting problem. My actual problem is a bit different and less interesting in general, so I will state it a bit differently, so that an answer would be more useful in general.
I need to distinguish expressions depending on their type, like arithmetic expressions from string expressions. Their top-level nonterminals have some common ancestors, like
statement
: TOK_PRINT expression TOK_SEMICOLON {...}
;
expression
: arithmeticexpression {...}
| stringexpression {...}
;
Now I need to be able to have variables in both kinds of expressions
arithmeticexpression
: TOK_IDENTIFIER {...}
;
stringexpression
: TOK_IDENTIFIER {...}
;
(only the variables of string-type should be allowed in the stringexpression case and only variables of int- or float-type should be allowed in the arithmeticexpression case) But this obviously causes an R/R conflict, which is inherent in the language - it is impossible to resolve, because the language is ambiguous.
Of course I could water down the language, so that only general "Expression" objects are passed around on the parse-stack, but then I'd have to do a lot of manual type-checking in the actions, which I would like to avoid. Also, in my real use-case, the paths through the grammar to the variable-rules are so different, that I would have to water the language down so much that I would lose a lot of grammar-rules (i.e. lose a lot of structural information) and would need to hand-write parsing-engines into some actions.
I have read about GLR-parsing, which sounds like it could solve my problem. I'm considering to use this feature: have the grammar as above and YYERROR
in the paths where the corresponding variable has the wrong type.
arithmeticexpression
: TOK_IDENTIFIER {
if(!dynamic_cast<IntVariable*>(
symbol_table[*$<stringvalue>1]))
YYERROR;
}
;
stringexpression
: TOK_IDENTIFIER {
if(!dynamic_cast<StringVariable*>(
symbol_table[*$<stringvalue>1]))
YYERROR;
}
;
but the Bison Manual says
- During deterministic GLR operation, the effect of YYERROR is the same as its effect in a deterministic parser.
- The effect in a deferred action is similar, but the precise point of the error is undefined; instead, the parser reverts to deterministic operation, selecting an unspecified stack on which to continue with a syntax error.
- In a semantic predicate (see Semantic Predicates) during nondeterministic parsing, YYERROR silently prunes the parse that invoked the test.
I'm not sure I understand this correctly - I understand it this way:
- doesn't apply here, because GLR-parsing isn't deterministic
- is how I have it in the code above, but shouldn't be done this way, because it is completely random which path is killed by the YYERROR (correct me if I'm wrong)
- wouldn't solve my problem, because semantic predicates (not semantic actions!) must be at the beginning of a rule (correct me if I'm wrong), at which point the yylval from the TOK_IDENTIFIER token isn't accessible (correct me if I'm wrong), so I can't look into the symbol table to find the type of the variable.
Does anyone have experience with such a situation? Is my understanding of the manual wrong? How would you approach this problem? To me, the problem seems to be so natural that I would assume people run into it often enough that bison would have a solution built-in...