Preserving the failure mode of a parser in Parsec

Question

Problem statement

Suppose I have two parsers p and q and I concatenate them like this:

r = try p *> q

In Parsec, the behavior of this is:

If p fails without consuming input, then r fails without consuming input.
If p fails after consuming input, then r fails without consuming input.
if q fails without consuming input, then r fails after consuming p.
if q fails after consuming input, then r fails after consuming p and parts of q.

However, the behavior I'm looking for is a bit unusual:

If p fails without consuming input, then r should fail without consuming input.
If p fails after consuming input, then r should fail without consuming input.
if q fails without consuming input, then r should fail without consuming input.
if q fails after consuming input, then r should fail after consuming some input.

I can't seem to think of a clean way to do this.

Rationale

The reason is that I have a parser like this:

s = (:) <$> q <*> many r

The q parser, embedded inside the r parser, needs a way to signal either: invalid input (which occurs when q consumes input but fails), or end of the many loop (which occurs when q doesn't consume anything and fails). If the input is invalid, it should just fail the parser entirely and report the problem to the user. If there is no more input to consume, then it should end the many loop (without reporting a parser error to the user). The trouble is that it's possible that the input ends with a p but without any more valid q's to consume, in which case q will fail but without consuming any input.

So I was wondering if anyone have an elegant way to solve this problem? Thanks.

Addendum: Example

p = string "P"
q = (++) <$> try (string "xy") <*> string "z"

Test input on (hypothetical) parser s, had it worked the way I wanted:

xyz (accept)
xyzP (accept; P remains unparsed)
xyzPx (accept; Px remains unparsed; q failed but did not consume any input)
xyzPxy (reject; parser q consumed xy but failed)
xyzPxyz (accept)

In the form r = try p *> q, s will actually reject both case #2 and case #3 above. Of course, it's possible to achieve the above behavior by writing:

r = (++) <$> try (string "P" *> string "xy") <*> string "z"

but this isn't a general solution that works for any parsers p and q. (Perhaps a general solution doesn't exist?)

try discards all information about the progress of the input (i.e. maps both kinds of failures to "failure without consuming input") so you can't emulate the behavior described with only try. — Rufflewind

Petr Petr · Accepted Answer · 2013-09-03T08:29:24

I believe I found a solution. It's not especially nice, but seems to works. At least something to start with:

{-# LANGUAGE FlexibleContexts #-}
import Control.Applicative hiding (many, (<|>))
import Control.Monad (void)
import Control.Monad.Trans (lift)
import Control.Monad.Trans.Maybe
import Text.Parsec hiding (optional)
import Text.Parsec.Char
import Text.Parsec.String

rcomb ::  (Stream s m t) => ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m b
rcomb p q = ((test $ opt p *> opt q) <|> pure (Just ()))
                >>= maybe empty (\_ -> p *> q)
  where
    -- | Converts failure to @MaybeT Nothing@:
    opt = MaybeT . optional -- optional from Control.Applicative!
    -- | Tests running a parser, returns Nothing if parsers failed consuming no
    -- input, Just () otherwise.
    test = lookAhead . try . runMaybeT . void

This is the r combinator you're asking for. The idea is that we first execute the parsers in a "test" run (using lookAhead . try) and if any of them fails without consuming input, we record it as Nothing inside MaybeT. This is accomplished by opt, it converts a failure to Nothing and wraps it into MaybeT. Thanks to MaybeT, if opt p returns Nothing, opt q is skipped.

If both p and q succeed, the test .. part returns Just (). And if one of them consumes input, the whole test .. fails. This way, we distinguish the 3 possibilities:

Failure with some input consumed by p or q.
Failure such that the failing part doesn't consume input.
Success.

After <|> pure (Just ()) both 1. and 3. result in Just (), while 2. results in Nothing. Finally, the maybe part converts Nothing into a non-consuming failure, and Just () into running the parsers again, now without any protection. This means that 1. fails again with consuming some input, and 3. succeeds.

Testing:

samples =
    [ "xyz" -- (accept)
    , "xyzP" -- (accept; P remains unparsed)
    , "xyzPz" -- (accept; Pz remains unparsed)
    , "xyzPx" -- (accept; Px remains unparsed; q failed but did not consume any input)
    , "xyzPxy" -- (reject; parser q consumed xy but failed)
    , "xyzPxyz" -- (accept)
    ]

main = do
    -- Runs a parser and then accept anything, which shows what's left in the
    -- input buffer:
    let run p x = runP ((,) <$> p <*> many anyChar) () x x

    let p, q :: Parser String
        p = string "P"
        q = (++) <$> try (string "xy") <*> string "z"

    let parser = show <$> ((:) <$> q <*> many (rcomb p q))
    mapM_ (print . run parser) samples

Preserving the failure mode of a parser in Parsec

Problem statement

Rationale

Addendum: Example

1 Answers