1
votes

I am making a simple parser in Happy (Bison equivalent for Haskell) and I stumbled upon a shift/reduce conflict in these rules:

ClassBlock :
        "{" ClassAttributes  ClassConstructor ClassFunctions "}" {ClassBlock $2 $3 $4}

ClassAttributes :
        {- empty -} { ClassAttributesEmpty }
      | ClassAttributes ClassAttribute {ClassAttributes $1 $2}

ClassAttribute :
        "[+]" Variable {ClassAttributePublic $2 }
      | "[-]" Variable {ClassAttributePrivate $2 }

ClassFunctions :
        {- empty -} { ClassFunctionsEmpty }
      | ClassFunctions ClassFunction {ClassFunctions $1 $2}

ClassFunction :
        "[+]" Function {ClassFunctionPublic $2}
      | "[-]" Function {ClassFunctionPrivate $2}

ClassConstructor :
       {- empty -} { ClassConstructorEmpty }
      | TypeFuncParams var_identifier Params Block {ClassConstructor $1 $2 $3 $4}

TypeFuncParams :
      Primitive ClosingBracketsNoIdentifier { TypeFuncParamsPrimitive $1 $2}
    | class_identifier ClosingBracketsNoIdentifier { TypeFuncParamsClassId $1 $2}
    | ListType {TypeFuncParamsList $1}

The info file states the shift/reduce conflict:

ClassBlock -> "{" ClassAttributes . ClassConstructor ClassFunctions "}"    (rule 52)
    ClassAttributes -> ClassAttributes . ClassAttribute    (rule 54)

    "[+]"          shift, and enter state 85
            (reduce using rule 61)

    "[-]"          shift, and enter state 86
            (reduce using rule 61)

Rule 61 is this one:

ClassConstructor :
   {- empty -} { ClassConstructorEmpty }

I am not really sure how to solve this problem. I tried using precedence rules to silence the warning, but it didn't work out as I expected.

1
are you sure about this ClassAttributes: (...) ClassAttributes ClassAttribute {ClassAttributes $1 $2} usually it is a bad idea to have rules like S -> S a, usually it is done the opposite way with S -> a S.Willem Van Onsem
you are totally right, however, since Happy builds an LALR parser it's actually better to do it with left recursion rules.Eddtothefullest
This is not the full grammar and does not provide enough information to diagnose the problem. Where is TypeFuncParams?rici
@willem: avoiding left recursion is only important for LL parsing. "Usually" depends a lot on your context; in mine, top-down parsing is unusual.rici
I didn't add the other rules because they are not part of the problem (according to Happy).Eddtothefullest

1 Answers

4
votes

Below is a simplified grammar which exhibits the same problem.

To construct it, I removed

  • all actions
  • the prefix "Class" from all nonterminal names

I also simplified most of the rules. I did this as an illustration of how you can construct a minimal, complete, verifiable example, as suggested by the StackOverflow guidelines, which makes it easier to focus on the problem while still permitting an actual trial. (I used bison, not happy, but the syntax is very similar.)

Block      : "{" Attributes Constructor Functions "}"
Attributes : {- empty -} | Attributes Attribute
Constructor: {- empty -} | "constructor"
Functions  : {- empty -} | Functions Function
Attribute  : "[+]" "attribute"
Function   : "[+]" "function"

Now, let's play parser, and suppose that we've (somehow) identified a prefix which could match Attributes. (Attributes can match the empty string, so we could be right at the beginning of the input.) And suppose the next token is [+].

At this point, we cannot tell if the [+] will later turn out to be the beginning of an Attribute or if it is the start of a Function which follows an empty Constructor. However, we need to know that in order to continue the parse.

If we've finished with the Attributes and about to start on the Functions, this is the moment where we have to reduce the empty nonterminal Constructor. Unless we do that now, we cannot then go on to recognize a Function. On the other hand, if we haven't seen the last Attribute but we do reduce a Constructor, then the parse will eventually fail because the next Attribute cannot follow the Constructor we just reduced.

In cases like this, it is often useful to remove the empty production by factoring the options into the places where the non-terminal is used:

Block      : "{" Attributes "constructor" Functions "}"
           | "{" Attributes Functions "}"
Attributes : {- empty -} | Attributes Attribute
Functions  : {- empty -} | Functions Function
Attribute  : "[+]" "attribute"
Function   : "[+]" "function"

But just removing Constructor isn't enough here. In order to start parsing the list of Functions, we need to first reduce an empty Functions to provide the base case of the Functions recursion, so we still need to guess where the Functions start in order to find the correct parse. And if we wrote the two lists as right-recursions instead of left-recursions, we'd instead need an empty Attributes to terminate the recursion of the Attributes recursion.

What we could do, in this particular case, is use a cunning combination of left- and right-recursion:

Block      : "{" Attributes "constructor" Functions "}"
           | "{" Attributes Functions "}"
Attributes : {- empty -} | Attributes Attribute
Functions  : {- empty -} | Function Functions
Attribute  : "[+]" "attribute"
Function   : "[+]" "function"

By making the first list left-recursive and the second list right-recursive, we avoid the need to reduce an empty non-terminal between the two lists. That, in turn, allows the parser to decide whether a phrase was an Attribute or a Function after it has seen the phrase, at which point it is no longer necessary to consult an oracle.

However, that solution is not very pretty for a number of reasons, not the least of which being that it only works for the concatenation of two optional lists. If we wanted to add another list of a different kind of item which could also start with the [+] token, a different solution would be needed..

The simplest one, which is used by a number of languages, is to allow the programmer to intermingle the various list elements. You might consider that bad style, but it is not always necessary to castigate bad style by making it a syntax error.

A simple solution would be:

Block      : "{" Things "}"
Things     : {- empty -} | Things Attribute | Things Function | Things Constructor
Attribute  : "[+]" "attribute"
Constructor: "constructor"
Function   : "[+]" "function"

but that doesn't limit a Block to at most one Constructor, which seems to be a syntactic requirement. However, as long as Constructor cannot start with a [+], you could implement the "at most one Constructor" limitation with:

Block      : "{" Things Constructor Things "}"
           | "{" Things "}"
Things     : {- empty -} | Things Attribute | Things Function
Attribute  : "[+]" "attribute"
Constructor: "constructor"
Function   : "[+]" "function"