How to negate a parser with Parsec

Question

I have a file with line endings “\r\r\n”, and use the parser eol = string "\r\r\n" :: Parser String to handle them. To get a list of the lines between these separators, I would like to use sepBy along with a parser that returns any text that would not be captured by eol. Looking through the documentation I did not see a combinator that negates a parser (an ‘anything but the pattern ”\r\r\n”’ parser).

I have tried using sepBy (many anyToken) end, but many anyToken appears to be greedy, not stopping for eol matches. I cannot use many (noneOf "\n\r"), because there are several places in my text with the single '\n' character.

Is there a combinator that can get me the inverse of string "\r\r\n"?

AndrewC AndrewC · Accepted Answer · 2014-09-15T06:05:21

I'm afraid you're going about it backwards. Parsec parsers don't chop up the input, they build the output. The more you try to parse by thinking about what you don't want, the harder it'll be. You need to think bottom-up what's permissable, not top down where you chop.

You should start with the smallest, most basic thing you do want. For example, don't think of an identifier as everything before a space, think of it as a letter followed by alphanumeric data. You can then combine that, separated by whitespace with the other things you expect on a line.

line = do
       i <- identifier
       whiteSpace
       string "="
       e <- expr
       return $ Line i e

Only when you've completed a parser that successfully parses what you want from a line and rejects invalid lines should you parse multiple lines:

lines = sepBy line eol

How to negate a parser with Parsec

3 Answers