Theory: LL(k) parser vs parser for LL(k) grammars

Question

I'm concerned about the very important difference between the therms: "LL(k) parser" and "parser for LL(k) grammars". When a LL(1) backtracking parser is in question, it IS a parser for LL(k) grammars, because it can parse them, but its NOT LL(k) parser, because it does not use k tokens to look ahead from a single position in the grammar, but its exploring with backtracking the possible cases, regardless that it still uses k tokens to explore. Am I right?

The question may break down to the way the look-ahead is performed. If the look-ahead is actually still processing the grammar with backtracking, that does not make it LL(k) parser. To be LL(k) parser the parser must not use the grammar with backtracking mechanism, because then it would be "LL(1) parser with backtracking that can parser LL(k) grammars". Am I right again?

I think the difference is related to the expectation that LL(1) parser is using a constant time per token, and LL(k) parser is using at most k * constant (linear to the look-ahead) time per token, not an exponential time as it would be in the case of a backtracking parser.

Update 1: to simplify - per token, is the parsing LL(k) expected to run exponentially in respect to k or in a linear time in respect to k?

Update 2: I have changed it to LL(k) because the question is irrelevant to the range of which k is (integer or infinity).

It's not possible to answer a question about "backtracking LL(1) parsers" unless you precisely define a backtracking algorithm (and even then it will be difficult to do the analysis). I provided a new answer which only talks about LL(k) grammars; I hope it answers your question listed in "Update 1". — rici

rici rici · Accepted Answer · 2020-09-18T15:39:23

An LL(k) parser needs to do the following at each point in the inner loop:

Collect the next k input symbols. Since this is done at each point in the input, this can be done in constant time by keeping the lookahead vector in a circular buffer.
If the top of the prediction stack is a terminal, then it is compared with the next input symbol; either both are discarded or an error is signalled. This is clearly constant time.
If the top of the prediction stack is a non-terminal, the action table is consulted, using the non-terminal, the current state and the current lookahead vector as keys. (Not all LL(k) parsers need to maintain a state; this is the most general formulation. But it doesn't make a difference to complexity.) This lookup can also be done in constant time, again by taking advantage of the incremental nature of the lookahead vector.
The prediction action is normally done by pushing the right-hand side of the selected production onto the stack. A naive implementation would take time proportional to the length of the right-hand side, which is not correlated with either the lookahead k nor the length of the input N, but rather is related to the size of the grammar itself. It's possible to avoid the variability of this work by simply pushing a reference to the right-hand side, which can be used as though it were the list of symbols (since the list can't change during the parse).

However, that's not the full story. Executing a prediction action does not consume an input, and it's possible -- even likely -- that multiple predictions will be made for a single input symbol. Again, the maximum number of predictions is only related to the grammar itself, not to k nor to N.

More specifically, since the same non-terminal cannot be predicted twice in the same place without violating the LL property, the total number of predictions cannot exceed the number of non-terminals in the grammar. Therefore, even if you do push the entire right-hand side onto the stack, the total number of symbols pushed between consecutive shift actions cannot exceed the size of the grammar. (Each right-hand side can be pushed at most once. In fact, only one right-hand side for a given non-terminal can be pushed, but it's possible that almost every non-terminal has only one right-hand side, so that doesn't reduce the asymptote.) If instead only a reference is pushed onto the stack, the number of objects pushed between consecutive shift actions -- that is, the number of predict actions between two consecutive shift actions -- cannot exceed the size of the non-terminal alphabet. (But, again, it's possible that |V| is O(|G|).

The linearity of LL(k) parsing was established, I believe, in Lewis and Stearns (1968), but I don't have that paper at hand right now so I'll refer you to the proof in Sippu & Soisalon-Soininen's Parsing Theory (1988), where it is proved in Chapter 5 for Strong LL(K) (as defined by Rosenkrantz & Stearns 1970), and in Chapter 8 for Canonical LL(K).

In short, the time the LL(k) algorithm spends between shifting two successive input symbols is expected to be O(|G|), which is independent of both k and N (and, of course, constant for a given grammar).

This does not really have any relation to LL(*) parsers, since an LL(*) parser does not just try successive LL(k) parses (which would not be possible, anyway). For the LL(*) algorithm presented by Terence Parr (which is the only reference I know of which defines what LL(*) means), there is no bound to the amount of time which could be taken between successive shift actions. The parser might expand the lookahead to the entire remaining input (which would, therefore, make the time complexity dependent on the total size of the input), or it might fail over to a backtracking algorithm, in which case it is more complicated to define what is meant by "processing an input symbol".

Theory: LL(k) parser vs parser for LL(k) grammars

2 Answers