Below is a simplified grammar which exhibits the same problem.
To construct it, I removed
- all actions
- the prefix "Class" from all nonterminal names
I also simplified most of the rules. I did this as an illustration of how you can construct a minimal, complete, verifiable example, as suggested by the StackOverflow guidelines, which makes it easier to focus on the problem while still permitting an actual trial. (I used bison, not happy, but the syntax is very similar.)
Block : "{" Attributes Constructor Functions "}"
Attributes : {- empty -} | Attributes Attribute
Constructor: {- empty -} | "constructor"
Functions : {- empty -} | Functions Function
Attribute : "[+]" "attribute"
Function : "[+]" "function"
Now, let's play parser, and suppose that we've (somehow) identified a prefix which could match Attributes
. (Attributes
can match the empty string, so we could be right at the beginning of the input.) And suppose the next token is [+]
.
At this point, we cannot tell if the [+]
will later turn out to be the beginning of an Attribute
or if it is the start of a Function
which follows an empty Constructor
. However, we need to know that in order to continue the parse.
If we've finished with the Attributes and about to start on the Functions, this is the moment where we have to reduce the empty nonterminal Constructor
. Unless we do that now, we cannot then go on to recognize a Function
. On the other hand, if we haven't seen the last Attribute
but we do reduce a Constructor
, then the parse will eventually fail because the next Attribute
cannot follow the Constructor
we just reduced.
In cases like this, it is often useful to remove the empty production by factoring the options into the places where the non-terminal is used:
Block : "{" Attributes "constructor" Functions "}"
| "{" Attributes Functions "}"
Attributes : {- empty -} | Attributes Attribute
Functions : {- empty -} | Functions Function
Attribute : "[+]" "attribute"
Function : "[+]" "function"
But just removing Constructor
isn't enough here. In order to start parsing the list of Functions, we need to first reduce an empty Functions
to provide the base case of the Functions
recursion, so we still need to guess where the Functions
start in order to find the correct parse. And if we wrote the two lists as right-recursions instead of left-recursions, we'd instead need an empty Attributes
to terminate the recursion of the Attributes
recursion.
What we could do, in this particular case, is use a cunning combination of left- and right-recursion:
Block : "{" Attributes "constructor" Functions "}"
| "{" Attributes Functions "}"
Attributes : {- empty -} | Attributes Attribute
Functions : {- empty -} | Function Functions
Attribute : "[+]" "attribute"
Function : "[+]" "function"
By making the first list left-recursive and the second list right-recursive, we avoid the need to reduce an empty non-terminal between the two lists. That, in turn, allows the parser to decide whether a phrase was an Attribute
or a Function
after it has seen the phrase, at which point it is no longer necessary to consult an oracle.
However, that solution is not very pretty for a number of reasons, not the least of which being that it only works for the concatenation of two optional lists. If we wanted to add another list of a different kind of item which could also start with the [+]
token, a different solution would be needed..
The simplest one, which is used by a number of languages, is to allow the programmer to intermingle the various list elements. You might consider that bad style, but it is not always necessary to castigate bad style by making it a syntax error.
A simple solution would be:
Block : "{" Things "}"
Things : {- empty -} | Things Attribute | Things Function | Things Constructor
Attribute : "[+]" "attribute"
Constructor: "constructor"
Function : "[+]" "function"
but that doesn't limit a Block to at most one Constructor, which seems to be a syntactic requirement. However, as long as Constructor
cannot start with a [+]
, you could implement the "at most one Constructor" limitation with:
Block : "{" Things Constructor Things "}"
| "{" Things "}"
Things : {- empty -} | Things Attribute | Things Function
Attribute : "[+]" "attribute"
Constructor: "constructor"
Function : "[+]" "function"
ClassAttributes: (...) ClassAttributes ClassAttribute {ClassAttributes $1 $2}
usually it is a bad idea to have rules likeS -> S a
, usually it is done the opposite way withS -> a S
. – Willem Van Onsem