non-greedy repetition with Parsec

Question

I am trying to split my input into those parts that match a certain pattern and the rest, let's say

data Data = A Int | B Char | C String
parseDatas :: Parsec [Token] () a [Data]

I already have written two more-or-less complicated parsers

parseA :: Parsec [Token] () Data
parseB :: Parsec [Token] () Data

that match the things I am looking for. Now the obvious solution is

parseDatas = many (parseA <|> parseB <|> parseC)

where the parser for the intermediate parts would look like this:

makeC :: [Token] -> Data
makeC = C . concatMap show -- or something like this
parseC :: Parsec [Token] () Data
parseC = makeC <$> many anyToken

Meh, that throws a runtime [ERROR] Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string. - ok, easily fixed:

parseC = makeC <$> many1 anyToken

But now parseC consumes the entire input (that starts with something I'm not looking for), ignoring any patterns that should yield an A or B!

If my patterns were regexes¹, I would now have changed the + operator to the non-greedy +? operator. How can I do the same for the many1 parser combinator?

_{1: which I cannot use, as I'm operating on tokens not characters}

A solution I found was

parseC = makeC <$> many1 (notFollowedBy (parseA <|> parseB) >> anyToken)

but that does look, uh, suboptimal. It's not really generic. There must be something better.

I also had a look at Parsec how to find "matches" within a string where the suggestion was to define a recursive parser, but that looks like a hazzle if I don't want to drop the intermediate tokens and collect them in a list instead.

Gurkenglas Gurkenglas · Accepted Answer · 2016-08-04T13:40:24

You could let parseC consume exactly one token at a time:

parseDatas = many $ parseA <|> parseB <|> (C . show <$> anyToken)

and then, if you want, group adjacent Cs into one to conserve semantics:

groupCs (C c) (C c':xs) = C (c ++ c') : xs
groupCs x xs = x : xs
parseDatas = foldr groupCs [] <$> many (parseA <|> parseB <|> (C . show <$> anyToken))

If you want to apply some operation make :: [Token] -> String on consecutive Cs:

data Data c = A Int | B Char | C c deriving Functor

groupCs :: [Data a] -> [Data [a]] -> [Data [a]]
groupCs (C c) (C cs:xs) = C (c:cs) : xs
groupCs (C c) xs = C [c] : xs
groupCs x xs = x : xs

parseDatas = (map.fmap) make . foldr groupCs [] <$> many (parseA <|> parseB <|> (C <$> anyToken))

non-greedy repetition with Parsec

3 Answers

How do we repeat a “non-greedy” pattern?

`manyTill_ p q` where `q` is the entire rest of the parser.

non-greedy repetition with Parsec

3 Answers

How do we repeat a “non-greedy” pattern?

manyTill_ p q where q is the entire rest of the parser.

`manyTill_ p q` where `q` is the entire rest of the parser.