4
votes

I am new to Haskell, and I have been trying to write a JSON parser using Parsec as an exercise. This has mostly been going well, I am able to parse lists and objects with relatively little code which is also readable (great!). However, for JSON I also need to parse primitives like

  • Integers (possibly signed)
  • Floats (possibly using scientific notation such as "3.4e-8")
  • Strings with e.g. escaped quotes

I was hoping to find ready to use parsers for things like these as part of Parsec. The closest I get is the Parsec.Tokens module (defines integer and friends), but those parsers require a "language definition" that seems way beyond what I should have to make to parse something as simple as JSON -- it appears to be designed for programming languages.

So my questions are:

  1. Are the functions in Parsec.Token the right way to go here? If so, how to make a suitable language definition?

  2. Are "primitive" parsers for integers etc defined somewhere else? Maybe in another package?

  3. Am I supposed to write these kinds of low-level parsers myself? I can see myself reusing them frequently... (obscure scientific data formats etc.)

I have noticed that a question on this site says Megaparsec has these primitives included [1], but I suppose these cannot be used with parsec.

Related questions:

How do I get Parsec to let me call `read` :: Int?

How to parse an Integer with parsec

1
You can just pass in empty strings and lists and always-failing parsers for the options that do not apply to you (which I guess is all of them).sepp2k
If you can, please use megaparsec. It is a better modern rewrite of parsec (and is actually b king maintained).Alec

1 Answers

6
votes

Are the functions in Parsec.Token the right way to go here?

Yes, they are. If you don't care about the minutiae specified by a language definition (i.e. you don't plan to use the parsers which depend on them, such as identifier or reserved), just use emptyDef as a default:

import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (emptyDef)

lexer = P.makeTokenParser emptyDef

integer = P.integer lexer

As you noted, this feels unnecesarily clunky for your use case. It is worth mentioning that megaparsec (cf. Alec's suggestion) provides a corresponding integer parser without the ceremony. (The flip side is that megaparsec doesn't try to bake in support for e.g. reserved words, but that isn't difficult to implement in the cases you actually need it.)