I am trying to split my input into those parts that match a certain pattern and the rest, let's say
data Data = A Int | B Char | C String
parseDatas :: Parsec [Token] () a [Data]
I already have written two more-or-less complicated parsers
parseA :: Parsec [Token] () Data
parseB :: Parsec [Token] () Data
that match the things I am looking for. Now the obvious solution is
parseDatas = many (parseA <|> parseB <|> parseC)
where the parser for the intermediate parts would look like this:
makeC :: [Token] -> Data
makeC = C . concatMap show -- or something like this
parseC :: Parsec [Token] () Data
parseC = makeC <$> many anyToken
Meh, that throws a runtime [ERROR] Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string.
- ok, easily fixed:
parseC = makeC <$> many1 anyToken
But now parseC
consumes the entire input (that starts with something I'm not looking for), ignoring any patterns that should yield an A
or B
!
If my patterns were regexes1, I would now have changed the +
operator to the non-greedy +?
operator. How can I do the same for the many1
parser combinator?
1: which I cannot use, as I'm operating on tokens not characters
A solution I found was
parseC = makeC <$> many1 (notFollowedBy (parseA <|> parseB) >> anyToken)
but that does look, uh, suboptimal. It's not really generic. There must be something better.
I also had a look at Parsec how to find "matches" within a string where the suggestion was to define a recursive parser, but that looks like a hazzle if I don't want to drop the intermediate tokens and collect them in a list instead.