7
votes

I have a file with line endings “\r\r\n”, and use the parser eol = string "\r\r\n" :: Parser String to handle them. To get a list of the lines between these separators, I would like to use sepBy along with a parser that returns any text that would not be captured by eol. Looking through the documentation I did not see a combinator that negates a parser (an ‘anything but the pattern ”\r\r\n”’ parser).

I have tried using sepBy (many anyToken) end, but many anyToken appears to be greedy, not stopping for eol matches. I cannot use many (noneOf "\n\r"), because there are several places in my text with the single '\n' character.

Is there a combinator that can get me the inverse of string "\r\r\n"?

3

3 Answers

8
votes

I'm afraid you're going about it backwards. Parsec parsers don't chop up the input, they build the output. The more you try to parse by thinking about what you don't want, the harder it'll be. You need to think bottom-up what's permissable, not top down where you chop.

You should start with the smallest, most basic thing you do want. For example, don't think of an identifier as everything before a space, think of it as a letter followed by alphanumeric data. You can then combine that, separated by whitespace with the other things you expect on a line.

line = do
       i <- identifier
       whiteSpace
       string "="
       e <- expr
       return $ Line i e

Only when you've completed a parser that successfully parses what you want from a line and rejects invalid lines should you parse multiple lines:

lines = sepBy line eol
5
votes

As a tentative answer, it looks like manyTill anyChar (try eol) does what I want. As part of my original question though, I'm still interested in knowing whether there is a general way to negate a parser, or whether there's another recommended way of doing what I want.

0
votes

The sepCap parser combinator from the package replace-megaparsec does this kind of parser negation, and returns a list of Either with the negative matches in Left and the positive matches in Right.

import Replace.Megaparsec
import Text.Megaparsec

parseTest (sepCap (chunk "\r\r\n" :: Parsec Void String String))
  $ "one\r\r\ntwo\r\r\nthree\r\r\n"
[ Left "one"
, Right "\r\r\n"
, Left "two"
, Right "\r\r\n"
, Left "three"
, Right "\r\r\n"
]