17
votes

About once a year I have to develop or at least design a grammar and a parser - that appears a constant of my working life.

Every time I'm facing this task, thus about once year, I, quite a lex/yacc (flex/bison resp.) guy, consider, or reconsider, alternatives for plain lex/yacc, and, after some musing and trying I get back to plain lex/yacc.

Because I have a CORBA-server at the hub of the application I can call in from from a parser written in almost every language, so this time I had a look at

  • antlr4 (Java) and antlr3 (Java but has RT for other languages),
  • SableCC (Java),
  • Parse::EBNF, Parse::Yapp and Marpa (Perl),
  • and SimpleParse (Python),

For me, the tandem antlr4 with antlrworks looked the most promising candidate, but I'm not yet convinced that the time spent spent on getting into it will be amortized in the end.


The grammar I have to develop is similar to SQL DDL (in terms of structure, not in terms of the subject).

Why would any of the alternatives would make my task easier than using plain lex/yacc?

2
I think this is a question like "which programming language should I use", which is unlikely to attract the sort of factual objective answer SO promotes. So voted to close as non-constructive. However, the question for you is: what is it about lex/flex/yacc/bison that you find unsatisfactory? That would at least give you a clue about what features to seek. If it's just "I'd like to try something new," then flip a coin :)rici
It's not comparable. If all generators would generate the same parser I would agree, but the outcome is completely different depending on the parser generator.Mike Lischke

2 Answers

13
votes

What you also should consider is that the various parser generators generate quite different parsers. Yacc/bison produces bottom-up parsers which are often hard to understand, hard to debug and give weird error messages. ANTLR for instance produces a recursive descent top-down parser which is much easier to understand, you can actually debug it easily, you can only use subrules for a parse operation (e.g. just parse expressions instead of the full language).

Additionally, its error recovery is way better and produces a lot cleaner errors. There are various IDEs/plugins/extensions that make working with ANTLR grammars pretty easy (ANTLRWorks, the IntelliJ plugin, the Visual Studio Code extension etc.). And you can generate parsers in different languages (C, C++, C#, Java and more) from the same grammar (unless you have language specific actions in your grammar, you mentioned this in your question already). And while we speak of actions: due to the evaluation principle in bottom parser (shift token, shift token, reduce them to a new token and shift it etc.) actions can easily cause trouble there, e.g. executing more than once and such. Not so with parsers generated by ANTLR.

I also tried various parser generators over the years, even wrote my own, but I would anytime recommend ANTLR as the tool of choice.

5
votes

The latest Marpa is Marpa::R2, which has great improvements in "whipituptude", including a very convenient new DSL interface, which is itself written in Marpa. You might consider starting with Marpa, for "prototyping". Marpa is highly declarative, using clean BNF. If you migrate away, you can take most of your work to the new parser. Marpa is unsurpassed in its error handling and detection, also very handy in a prototyping phase.

Marpa parses all the classes of grammar parsed by the other parsers listed in linear time, and is unsurpassed in its flexibility. Its newest feature allows you to switch back and forth from Marpa to your own parsing code. So you might even stay with it. There is a website, and my blog has a series of tutorials, which may be the best way to get introduced to Marpa.