The quoted Wikipedia extract is not a correct EBNF grammar for EBNF. It's also not left-parseable: indeed, it is ambiguous, so it's not unambiguously parseable at all.
In general, the terms LL(k)
and LR(k)
(and many other such terms) apply to Context-Free Grammars (CFGs) (and, by extension, the languages generated by those grammars). EBNF is not a formalism for describing CFGs. It is designed to be a formalism to describe context-free languages and therefore it should be possible to create a CFG from a given EBNF grammar (but see Note 1), but there is not a direct correspondence between an EBNF syntax rule and a single production in a CFG.
That said, you can usually directly create a CFG by using some standard transformations. For example:
{ ... }
can be substituted with the generated non-terminal M'', with the addition of the following productions: (ε
is the empty string)
M' → ...
M'' → ε
M'' → M' M''
The above transformation does not introduce left-recursion, so it does not artificially make the original grammar non-LL(1).
The most important error in the cited grammar [Note 2] is the ambigous EBNF rule:
rhs = identifier
| terminal
| "[" , rhs , "]"
| "{" , rhs , "}"
| "(" , rhs , ")"
| rhs , "|" , rhs
| rhs , "," , rhs
;
It's also left-recursive, so it will not correspond to an LL(1) CFG. But more importantly, it does not indicate either the associativity or the precedence of the |
and ,
operators. (Semantically, these operators do not have a defined associativity, but the syntax should still specify one; otherwise, it is impossible to unambiguously create a parse tree. The precedence between the two operators is important semantically.)
A better set of rules would be:
primary = identifier
| terminal
| "[" , rhs , "]"
| "{" , rhs , "}"
| "(" , rhs , ")"
;
factor = primary , { "|" , primary } ;
rhs = factor , { "," , factor } ;
This is still an oversimplification, but it covers a large number of use cases. It's neither ambiguous nor left-recursive. [3]
Notes
Syntactic constraints specified in comments may not be easy to translate into CFGs, though. For example, the ISO standard EBNF for EBNF defines the non-terminal "syntactic exception" as follows:
syntactic exception =
? a syntactic-factor that could be replaced
by a syntactic-factor containing no
meta-identifiers
?
The intention of the above text is to restrict an exception to be a regular language. That's important, since the set difference between two context-free languages is not necessarily context-free, while the difference between a context-free language and a regular language is provably context-free. Ironically, the "special sequence" describing this restriction cannot be expressed as a context-free grammar because it depends on the definition of meta-identifiers. (If it had said "a syntactic-factor containing no meta-identifiers" it would be easy to write without resorting to a special sequence, but clearly that was not the intent.)
There is another important error in the Wikipedia excerpt. It defines both types of quoted strings as having the same body, but that's not correct; a double-quoted string cannot include double-quote characters, and a single-quoted string cannot contain single-quote characters. So the use of the identifier character
in both of those definitions is incorrect.
The formal EBNF grammar allows primary
to be empty. I left that out, because it's not usually needed.