So in my language I want to have dot-syntax expressions:
myObject.myProperty
myObject.myProperty.subProperty
And I want declarations
Object myObject = 1
Additionally, object types can be namespaced:
Object.SubObject mySubObject = 1
A simplified grammar is as follows:
program:
declaration;
| expression;
declaration:
TOKEN_IDENTIFIER TOKEN_IDENTIFIER '=' TOKEN_INTEGER;
| TOKEN_IDENTIFIER '.' TOKEN_IDENTIFIER TOKEN_IDENTIFIER '=' TOKEN_INTEGER;
expression:
TOKEN_IDENTIFIER;
| expression '.' TOKEN_IDENTIFIER;
Unfortunately, compiling this grammar with Bison gives a shift-reduce conflict. Looking at the state machine output, it seems to me there's an error in the way Bison interprets it. The following is state 1, which is the state after reading the first identifier:
State 1
3 declaration: "identifier" . "identifier" '=' "integer"
4 | "identifier" . '.' "identifier" "identifier" '=' "integer"
5 expression: "identifier" .
"identifier" shift, and go to state 5
'.' shift, and go to state 6
"end of code" reduce using rule 5 (expression)
'.' [reduce using rule 5 (expression)]
And state 6 (the default shift state when reading a dot) is only for a declaration:
State 6
4 declaration: "identifier" '.' . "identifier" "identifier" '=' "integer"
"identifier" shift, and go to state 10
It seems to me that, in state 1, there should not be a possibility to reduce upon reading a dot. It should look ahead, and if it sees two identifiers right after each other (no dot in between), it should then shift to a declaration-only state, but if it sees a second dot or end of code, it then reduces to expression. The fact that the rule for a declaration is the only instance where two identifiers can be found side-by-side without a dot between should disambiguate the grammar so there are no shift-reduce errors.
I tried this with ielr and canonical-lr with the same results (don't know if that should matter).
Any ideas? Is my interpretation of how it should work incorrect?
TOKEN_IDENTIFIER '.' TOKEN_IDENTIFIER
, it doesn't know, outside of you providing other hints to it, whether it should reduce, returning an "expression", or shift in another token, because it's seen the first part of a "declaration". Some additional syntax, like requiring both an "expression" and a "declaration" to be terminated by a semicolon or something, would help reduce that ambiguity... With your current rules,x.y a
could be either an expression followed by the beginning of another (x.y a.b
) or the start of a declaration... – twalberg