0
votes

After having completed the Compiler Design course at my university I have been playing around with making a compiler for a simple programming language, but I'm having trouble with the parser. I'm making the compiler in mosml and using its builtin parser mosmlyac for constructing the parser. Here is an excerpt from my parser showing the grammar and associativity+precedence.

...
%right ASSIGN
%left OR
%left AND
%nonassoc NOT
%left EQUAL LESS
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc NEGATE
...
Prog : FunDecs EOF  { $1 }
;

FunDecs : Fun FunDecs   { $1 :: $2 }
        |               { [] }
;

Fun : Type ID LPAR TypeIds RPAR StmtBlock   { FunDec (#1 $2, $1, $4, $6, #2 $2) }
    | Type ID LPAR RPAR StmtBlock           { FunDec (#1 $2, $1, [], $5, #2 $2) }
;

TypeIds : Type ID COMMA TypeIds     { Param (#1 $2, $1) :: $4 }
        | Type ID                   { [Param (#1 $2, $1)] }
;

Type : VOID                     { Void }
     | INT                      { Int }
     | BOOL                     { Bool }
     | CHAR                     { Char }
     | STRING                   { Array (Char) }
     | Type LBRACKET RBRACKET   { Array ($1) }
;

StmtBlock : LCURLY StmtList RCURLY  { $2 }
;

StmtList : Stmt StmtList    { $1 :: $2 }
         |                  { [] }
;

Stmt : Exp SEMICOLON                    { $1 }
     | IF Exp StmtBlock                 { IfElse ($2, $3, [], $1) }
     | IF Exp StmtBlock ELSE StmtBlock  { IfElse ($2, $3, $5, $1) }
     | WHILE Exp StmtBlock              { While ($2, $3, $1) }
     | RETURN Exp SEMICOLON             { Return ($2, (), $1) }
;

Exps : Exp COMMA Exps   { $1 :: $3 }
     | Exp              { [$1] }
;

Index : LBRACKET Exp RBRACKET Index     { $2 :: $4 }
      |                                 { [] }
;

Exp : INTLIT                    { Constant (IntVal (#1 $1), #2 $1) }
    | TRUE                      { Constant (BoolVal (true), $1) }
    | FALSE                     { Constant (BoolVal (false), $1) }
    | CHRLIT                    { Constant (CharVal (#1 $1), #2 $1) }
    | STRLIT                    { StringLit (#1 $1, #2 $1) }
    | LCURLY Exps RCURLY        { ArrayLit ($2, (), $1) }
    | ARRAY LPAR Exp RPAR       { ArrayConst ($3, (), $1) }
    | Exp PLUS Exp              { Plus ($1, $3, $2) }
    | Exp MINUS Exp             { Minus ($1, $3, $2) }
    | Exp TIMES Exp             { Times ($1, $3, $2) }
    | Exp DIVIDE Exp            { Divide ($1, $3, $2) }
    | NEGATE Exp                { Negate ($2, $1) }
    | Exp AND Exp               { And ($1, $3, $2) }
    | Exp OR Exp                { Or ($1, $3, $2) }
    | NOT Exp                   { Not ($2, $1) }
    | Exp EQUAL Exp             { Equal ($1, $3, $2) }
    | Exp LESS Exp              { Less ($1, $3, $2) }
    | ID                        { Var ($1) }
    | ID ASSIGN Exp             { Assign (#1 $1, $3, (), #2 $1) }
    | ID LPAR Exps RPAR         { Apply (#1 $1, $3, #2 $1) }
    | ID LPAR RPAR              { Apply (#1 $1, [], #2 $1) }
    | ID Index                  { Index (#1 $1, $2, (), #2 $1) }
    | ID Index ASSIGN Exp       { AssignIndex (#1 $1, $2, $4, (), #2 $1) }
    | PRINT LPAR Exp RPAR       { Print ($3, (), $1) }
    | READ LPAR Type RPAR       { Read ($3, $1) }
    | LPAR Exp RPAR             { $2 }
;

Prog is the %start symbol and I have left out the %token and %type declaration on purpose.

The problem I have is that this grammar seems to be ambiguous and looking at the output of running mosmlyac -v on the grammar it seems that it is the rules containing the token ID that is the problem and creates shift/reduce and reduce/reduce conflicts. The output also tells me that the rule Exp : ID is never reduced.

Can anyone help me make this grammar unambiguous?

1

1 Answers

0
votes

Index has an empty production.

Now consider:

Exp : ID
    | ID Index

Which of those applies? Since Index is allowed to be empty, there is no context in which only one of those is applicable. The parser generator you are using evidently prefers to reduce an empty INDEX, making Exp : ID unusable and creating a large number of conflicts.

I'd suggesting changing Index to:

Index : LBRACKET Exp RBRACKET Index     { $2 :: $4 }
      | LBRACKET Exp RBRACKET           { [ $2 ] }

although in the long run, you might be better off with a more traditional "lvalue/rvalue" grammar, in which lvalue includes ID and lvalue [ Exp ] and rvalue includes lvalue. (That will give a more elaborate parse tree for ID [ Exp ] [ Exp ], but there is an obvious homormorphism.)