3
votes

What is the difference between ANTLR and parboiled for parsing in Java?

  • Which is easier to use for a beginner in parsing?
  • Which is more scalable? (from simple to complex grammar)
  • Which has better support for AST construction?
  • Which produces better error or warning messages for syntax errors?
  • Which has less problems to contend with? (e.g. left recursion, shift/reduce conflicts, reduce/reduce conflicts)
  • Comparison with other open source tools also appreciated
3
My take on what good parsing machinery is, at Quora (sorry, wrote the answer there instead of at SO): quora.com/What-is-the-most-powerful-parser-algorithm/answer/…Ira Baxter
See also Rekex , a PEG parser generator for Java 17. It derives the gramma from datatypes of the parse tree, so that it's not necessary to have a separate definition of grammar.ZhongYu

3 Answers

2
votes

Parboiled looks like a really cool tool. It might be easier for beginners as it is just pure programming using a "parser combinator" idiom. I think that this would become very verbose and harder to read, though the Java grammar doesn't look too bad that I see. I cannot comment on its AST construction but ANTLR 4 generates parse trees not a ASTs. It claims to have good error messages/recovery but that is suspect because it is based upon parser expression grammars, which can only detect errors once the entire input is been seen (worst case). It also cannot identify ambiguities in your grammar (not conflicts, ambiguities). Neither tool announces parsing conflicts. ANTLR 4 handles direct left recursion for things like arithmetic expressions but in general neither tool can handle left recursion. ANTLR requires that you use a library for its parser interpreter like parboiled but you must learn to use the tool if you want to have it generate parsers. Currently, ANTLR 4 can generate parsers in Java, C#, JavaScript, Python 2, Python 3.

1
votes

Today, Parboiled is mainly scala-tool. So, if you are using scala it may be better solution for most cases.

Ease of use

ANTLR should be much easier for beginners. It's easier to start with.

  • There's the book about ANTLR. It's also well described in DSLs in action. And it has better documentation in general.
  • There are ANTLR plugins for different IDEs. They will allow you to see the AST and give you some other support.

Parboiled is a scala library. You will have syntax highlights and type check out of the box. Parboiled1 works fine in most IDEs. Parbiled2 doesn't (will be fixed soon in Idea). The library uses macro-expressions and the most IDEs are not comfortable with it. That's why you will have everything red.

But both are pretty easy to start with.

  • You can try ANTLR from console (please correct me if I'm wrong).
  • You can install sbt add parboiled as a dependency and play in scala console.

Scalability

In my opinion Parboiled is more scalable. Because you are writing scala code. You can decompose your parser to multiple scala traits and mix them one with another. You may create DateTime parser, and mix it to LogEvent parser or $PROTOCOL_NAME parser. And reuse them easily. For parboiled1 you can do some naughty things in runtime. Well, it gives you power. For some cases you can construct parsers on a fly. For example you have datetime format, defined as string. You can read the format string and generate the appropriate parser for it. It is possible even for Parboiled2 (which does lot's of stuff during compile time). I don't know whether it's possible for ANTLR.

AST

I like the Parboiled approach to AST. It expects you to define ADT. So in ideal case you will have an immutable tree of case classes. you may add some 'dsl-like' stuff to your tree nodes. For example you may define "\" method to your node, which returns a child with specified name.

case class Node(value: String) {
  ....
  def \ (childName: String): Option[Node] =
    this.children.find(child => child.name == childName)
}

And Then use it:

city \ "3rd street" \ "23"

This makes work with AST much easier. I hope it helps.

Using in production

  • If you are using parboiled you have do add it to your dependency list. That's all. You will have everything working right out the box.
  • If you are using ANTLR, you have to generate *.java files first. And regenerate every time you change the grammar. For the most cases grammar is not changed very often. But In my experience I had situations where we've changed grammar every 2days. You may not thing that it's a problem though.
1
votes

Well if I have to compare as a developer who have recently used both frameworks as a newbie to parsing frameworks, then I have the below comparison.

ANTLR Parboiled
1 It has better documentation in general, has its own website, there’s a book (The Definitive ANTLR reference by Terrence Parr), have multiple examples available in git. It has limited documentation, that is only available in git.
2 There are ANTLR plugins for different IDEs that allows to see the syntax diagram of rules, check parseTree for the inputs. It helps a lot in writing the rules. It does not have any plugins for IDEs.
3 It’s a java framework, written in java. It’s a Scala library/framework and is good if we are writing the parser in Scala. And Parboiled2 doesn’t support java. So, if we have to use it in java, we need the old Parboiled1.
4 In Antlr we write the parsing rules or the grammars separately in .g4 files. We need to generate *.java files corresponding to the grammar first. And regenerate every time we change the grammar. In Parboiled we have to write the parsing rules and grammar in the java itself.
5 In antlr we get the ParseTree (which is similar to AST) by passing the input to the generated *.java antlr classes. In parboiled we have to use the Abstract Data Types and use the value stack to push & pop the values while writing the grammar to get the AST.

So, after using the two I find Antlr a bit easier to use and learn.