1
votes

I'm trying to build a random query generator for a query language we developed. The idea is to generate random queries by following the rules in a parse-table. So far, all of the LL parser generators I've tried only generate recursive-descent parsers. I could try and modify the generated parser but looking at the parser that ANTLR generated for a tiny expression language, a parser for our query language will be very large.

So, I think a table-driven parser would be smaller and easier to tweak. Are there any open-source tools that can generate a table-driven LL parser?

Alternatively, can a LR parse table be used "in reverse" to derive random queries?

1
Why do you care how big the parser is? You're building a testing tool. What, 6Gb of RAM is more than $40? - Ira Baxter
I'm not worried about the resource usage. I'm going to have to modify the generated parser code to do something else: generate random queries, so I'd want the file to be a manageable size and recursive-decent parsers can get big quick. - PPC-Coder
I'd rather have to read and modify a few 1000 lines rather than 100K lines of generated code. - PPC-Coder
Ah. The answer is you expect to modify the output by hand. Ok, then size matters. - Ira Baxter
Yup. And our query language is pretty complex. So recursive-descent can get fairly big. - PPC-Coder

1 Answers

3
votes

Yes, an LR parser table can be used to generate random queries. You simply traverse the tables as if you were parsing, and anyplace a state table offers you a transition or a reduction on a token, you randomly propose that token, with preference for reductions once the size of the query is "large enough". The token sequence is your query.

But you don't need the LR table for this; you can traverse your grammar definition (a set of BNF rules) in a similar way. You'll want to first normalize all your rules to the following form:

     LHS = RHS1 RHS2 ... RHSn ;

This gets rid of Kleene star and plus, and alternatives, by adding new generated nonterminals to represent complex subphrases in the original rules. This is straightforward to do.

If you have a non-terminal (initially the goal symbol), merely find rules with matching LHS, pick one at random, and expand that rule. You have to keep track of your position in the rule as a "dot" (same idea as "items" in LR parser generation). Given a rule and a dot, either the token selected is a terminal (propose it, being no choice!) or a nonterminal (recursively generate a token sequence for the nonterminal using the method of this paragraph).