6
votes

Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.

<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two

The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.

I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
  ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
  if ok then p         -- consume the input if the value passed the predicate
   else fail lbl       -- otherwise raise the error at the *start* of this token

pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.

Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?

Thanks!

4
You should post a MWE.jub0bs

4 Answers

5
votes

I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both cases Megaparsec just does the right thing:

{-# LANGUAGE TypeApplications #-}

module Main (main) where

import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
  r <- lookAhead p
  if f r
    then p
    else fail msg

withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
  mpos <- getNextTokenPosition -- †
  r    <- p
  if f r
    then return r
    else do
      forM_ mpos setPosition
      fail msg

main :: IO ()
main = do
  let msg = "I only like numbers greater than 42!"
  parseTest' (withPredicate1 @Integer (> 42) msg L.decimal) "11"
  parseTest' (withPredicate2 @Integer (> 42) msg L.decimal) "22"

If I run it:

The next big Haskell project is about to start!
λ> :main
1:1:
  |
1 | 11
  | ^
I only like numbers greater than 42!
1:1:
  |
1 | 22
  | ^
I only like numbers greater than 42!
λ>

Try it for yourself! Works as expected.


getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.

4
votes

It's not a solution I like, but you can hypnotize Parsec into believing it's had a single failure with consumption:

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

Here's a complete example:

import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

type P a = Parsec String () a

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
    pos <- getPosition
    x <- p
    unless (pred x) $ failAt pos msg
    return x

natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

main = print $ runParser pPowerOfTwo  () "myinput" "4095"

When run, it results in:

Left "myinput" (line 1, column 1):
expecting power of two
3
votes

I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem.

In my case, I here's how I worked around the problem:

I solved stacked an Exception monad within my ParsecT type.

type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)

Then I introduced a pair of combinators: (Note: Loc is my internal location type)

-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg


withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa

Now in parsing I can write:

parsePrimeNumber = withLoc $ \loc ->
  i <- parseInt
  unless (isPrime i) $ throwSemanticError loc "number is not prime!"
  return i

The top level interface to run one of these monads is really nasty.

runP :: Monad m
    => ParseOpts
    -> P m a
    -> String
    -> m (ParseResult a)
runP pos pma inp = 
  case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
    mea -> do
             ea <- mea
             case ea of
               -- semantic error (throwSemanticError)
               Left err -> return $! PError err
               -- regular parse error
               Right (Left err) -> return $ PError (errToDiag err)
               -- success
               Right (Right a) -> return (PSuccess a [])

I'm not terribly happy with this solution and desire something better.

I wish parsec had a:

semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p = 
    a <- p
    z <- pred a
    unless z $
       ... somehow raise the error from the beginning of this token/parse 
       rather than the end ... and when propagating the error up, 
      use the end parse position, so this parse error beats out other 
      failed parsers that make it past the beginning of this token 
      (but not to the end)
    return a
0
votes

Using lookAhead, we can run a parser without consuming any input or registering any new errors, but record the state that we end up in. We can then apply a guard to the result of the parser. The guard can fail in whatever manner it desires if the value does not pass the semantic check. If the guard fails, then the error is located at the initial position. If the guard succeeds, we reset the parser to the recorded state, avoiding the need to re-execute p.

guardP :: Stream s m t => (a -> ParsecT s u m ()) -> ParsecT s u m a -> ParsecT s u m a
guardP guard p = do
  (a, s) <- try . lookAhead $ do
    a <- p
    s <- getParserState
    return (a, s)
  guard a
  setParserState s
  return a

We can now implement pPowerOfTwo:

pPowerOfTwo :: Stream s m Char => ParsecT s u m Integer
pPowerOfTwo = guardP guardPowerOfTwo natural <?> "power of two"
  where guardPowerOfTwo s = unless (s `elem` [2^i | i <- [1..20]]) . unexpected $ show s