2
votes

I'm reading the PHP portation of the lemon parser:

for ($i = 0; $i < $this->nstate; $i++) {   /* Loop over all states */
    $stp = $this->sorted[$i]->data;
    for ($cfp = $stp->cfp; $cfp; $cfp = $cfp->next) {
        /* Loop over all configurations */
        if ($cfp->rp->nrhs == $cfp->dot) {        /* Is dot at extreme right? */
            for ($j = 0; $j < $this->nterminal; $j++) {
                if (isset($cfp->fws[$j])) {
                    /* Add a reduce action to the state "stp" which will reduce by the
                    ** rule "cfp->rp" if the lookahead symbol is "$this->symbols[j]" */
                    PHP_ParserGenerator_Action::Action_add($stp->ap, PHP_ParserGenerator_Action::REDUCE,
                                            $this->symbols[$j], $cfp->rp);
                }
            }
        }
    }
}

It seems to me the parser is a SLR(1) parser according to how it computes the action table,but @the home page of lemon,it demonstrates itself as a LALR(1) parser:

http://www.hwaci.com/sw/lemon/

Is it SLR(1) or LALR(1) ?

1

1 Answers

2
votes

If it were pure SLR, there wouldn't be any lookahead symbols ($this->symbol[$j]) used to control a reduction. So I conclude it is LALR(1).

EDIT: yoyo is right SLR(1) does use next-input symbols to control reductions (I misread the question as [LALR(1) vs] SLR(0), which simply doesn't care); I stand corrected. In checking, SLR(1) uses the (production rule context-free) FOLLOW set to control reductions; LALR(1) uses the (left-context dependent) LOOKAHEAD set. So, both have a "lookahead" set on each reduction. That means you can't tell from this code fragment which kind it is; at best we hope the coder is really computing "a" lookahead set. You'd have to see the rest of the code to know what kind it is.

As a practical matter, if you are going to build a bottom up parser generator, you can choose to build SLR(0) [which I did once upon a time and that's how my brain misread the question), SLR(1), LALR(1), and LR(1) parser generators using almost the exact same machinery. 30 years of experience has shown that LALR(1) is the most practical of these, so the default is ... LALR(1); SLR(x) is strictly a subset so why bother doing that if only a tiny bit more effort gets you LALR(1)? If the Lemon implementer follows tradition, I'd expect an LALR(1) parser generator. So now you have sort of take their word for it.

Of course, you can construct a simple experiment to convince yourself. Simply build a grammar that SLR(1) can't propertly parse that LALR(1) can, and try it. Or you can read the code really carefully.

See LALR parsing at http://en.wikipedia.org/wiki/LALR_parser