How do merged lookahead sets help LALR parse more grammars than SLR?

Question

From what I have searched on Google so far, the only difference between LALR(1) and SLR(1) is that LALR(1) uses states with merged lookahead sets. One source says that LALR(1) won't run into shift-reduce conflicts because the parser will "remember" from which state it arrived at the current state. However, there are no differences in the parsing algorithm, so I'm having trouble seeing how these merged lookahead sets would help the LALR parser resolve shift-reduce conflicts. And if the parser remembers from which state it arrived at the current state, why is it still vulnerable to reduce-reduce conflicts?

rici rici · Accepted Answer · 2016-06-04T17:53:50

It is actually SLR which uses merged lookahead sets. Or, more accurately, the lookahead set for each production in an SLR parse table is approximated as the FOLLOW set for the last symbol in the production, effectively ignoring the context.

LALR parsers have merged states with respect to canonical LR parsers. In both LR and LALR, in contrast to SLR, tbe lookahead sets are computed accurately, but in the case of the LALR parser, the states are merged so that it has the same states (but not the same lookahead sets) as tbe SLR parser.

There is a reasonable discussion of the differences in the Wikipedia articles on SLR parsing, LALR parsing and Canonical LR parsing. See the references for those articles for more information.

No matter which parser-generation technique you use, there will be conflicts if the grammar requires more lookahead. For example, the following grammar requires two lokahead symbols to decide whether to shift or reduce a NAME:

prog → λ
prog → prog defn
defn → NAME :
defn → defn NAME

Here, a NAME is part of a defn if it is not followed by a colon, so to decide whether or not to shift a NAME, you need not just the NAME, which is the lookahead token, but also tge following token, the second lookahead.

Here is a very simple grammar which is not SLR(1), which might possibly illuminate the problem with using follow sets. (It comes straight out of the Dragon book, and is often used as an example of why SLR(1) grammars are inadequate):

S → L = R
S → R
L → id
L → * R
R → L

Here, the FOLLOW sets of R and L are identical. Both R and L include =, since * R = ... is valid, and will reduce to L = ..., and clearly both R and L can appear at the end of an S, so both FOLLOW sets include the end-of-input marker.

The problem then is to decide whether or not to reduce R → L. In L = R, we should leave the L as is and shift the =. But the FOLLOW sets don't help, because R can be followed by =. So in an SLR(1) grammar, which only uses FOLLOW sets, there is a shift/reduce conflict in the state:

S → L • = R
R → L •

This conflict doesn't exist in the LALR(1) grammar because in that state the lookahead set for the production R → L does not include =, because the production was included from the item S → • R, whose lookahead is the end-of-input marker.

How do merged lookahead sets help LALR parse more grammars than SLR?

1 Answers