There are many open sourced parser implementations available to us in Haskell. Parsec seems to be the standard for text parsing and attoparsec seems to be a popular choice for binary parsing but I don't know much beyond that. Is there a particular decision tree that you follow for choosing a parser implementation? Have you learned anything interesting about the strengths or weaknesses of the libraries?
4 Answers
You have several good options.
For lightweight parsing of String types:
For packed bytestring parsing, e.g. of HTTP headers.
For actual binary data most people use either:
The main question to ask yourself is what is the underlying string type?
- String?
- bytestring (strict)?
- bytestring (lazy)?
- unicode text
That decision largely determines which parser toolset you'll use.
The second question to ask is: do I already have a grammar for the data type? If so, I can just use happy
And obviously for custom data types there are a variety of good existing parsers:
Just to add to Don's post: Personally, I quite like Text.ParserCombinators.ReadP (part of base) for no-nonsense quick and easy stuff. Particularly when Parsec seems like overkill.
There is a bytestringreadp library for the bytestring version, but it doesn't cover Char8 bytestrings, and I suspect attoparsec would be a better choice at this point.
Bryan O’Sullivan’s blog post What’s in a parser? Attoparsec rewired (2/2) includes a nice performance benchmark comparing several implementations along with some comments comparing memory usage.