3
votes

I've got a BNF and EBNF for a grammar. The BNF is obviously more verbose. I have a fairly good idea as far as using the BNF to build a recursive-descent parser; there are many resources for this. I am having trouble finding resources to convert an EBNF to a recursive-descent parser. Is this because it's more difficult? I recall from my CS theory classes that we went over EBNFs, but we didn't go over converting them into a recursive-descent parser. We did go over converting BNF's into a recursive-descent parser.

The reason I'm asking is because the EBNF is more compact.

From looking at the EBNF's in general, I notice that terms enclosed between { and } can be converted into a while loop. Are there any other guidelines or rules?

3

3 Answers

7
votes

You should investigate so-called metacompilers, which essentially compile EBNF into recursive descent parsers. How they do it is exactly the answer your question. (Its pretty straightfoward, but good to understand the details).

A really wonderful paper is the "MetaII" paper by Val Schorre. This is metacompiler technology from honest-to-God 1964. In 10 pages, he shows you how to build a metacompiler, and provides not just that, but another compiler too and the output of both!. There's an astonishing moment that you come too if you go build one of these, where you realized how the meta-compiler compiles itself using its own grammar. This moment got me hooked on compiler back in about 1970 when I first tripped over this paper. This is one of those computer science papers that everybody in the software business should read.

James Neighbors (the inventor of the term "domain" in software engineering, and builder of the first program transformation system [based on these metacompilers] has a great online MetaII tutorial, for those of you that don't want the do-it-from-scratch experience. (I have nothing to do with this except that Neighbors and I were undergraduates together).

Both ways are a fine way to learn about metacompilers and generating parsers from EBNF.

The key ideas are that the left hand side of a rule creates a function that parses that nonterminal and returns true if match and advances the input stream; false if no match and the input stream doesn't advance. The contents of the function is determined by the right hand side. Literal tokens are matched directly. Nonterminals cause calls to other functions generated for the other rules. Kleene* maps to while loops, alternations map to conditional branches. What EBNF doesn't address, and the metacompilers do, is how does parsing do anyting other than saying "matched" or not? The secret is weaving output operations into the EBNF. The MetaII paper makes all this crystal clear.

4
votes

Neither is harder than the other. It is really the difference between implementing something iteratively and implementing something recursively. In BNF, everything is recursive. In EBNF, some of the recursion is expressed iteratively. There are different variations in EBNF syntax, so I'll just use the English... "zero or more" is a simple while loop as you have discovered. "One or more" is the same as one followed by "zero or more". "Zero or one times" is a simple if statement. That should cover most of the cases.

1
votes

The early meta compilers META II and TREEMETA and their kin are not exactly recursive decent parser. They were were stated as using recursive functions. That just meant they could call them selves.

We do not call C a recursive language. A C or C++ function is recursive in the same way the early meta compilers are recursive.

Recursion can be used. They were programming languages. Recursion is generally used only when analyzing nexted language constructs. For example parenthesized expression and nexted blocks.

More of an LR recursive decent combination. CWIC the last documented one has extensive backtracking and look ahead features. The '-' not operator can match any language construct. And inverts it success or failure. -term fails if a term is matched for example. The input is never advanced. The '?' looks ahead and matches any language construct ?expr for example would try to parse an expr. The look ahead '?' matched construct is not kept or is the input advanced.