Parsec: error message at specific location

Question

Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.

<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two

The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.

I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
  ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
  if ok then p         -- consume the input if the value passed the predicate
   else fail lbl       -- otherwise raise the error at the *start* of this token

pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.

Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?

Thanks!

Mark Karpov Mark Karpov · Accepted Answer · 2017-08-11T13:16:21

I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both cases Megaparsec just does the right thing:

{-# LANGUAGE TypeApplications #-}

module Main (main) where

import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
  r <- lookAhead p
  if f r
    then p
    else fail msg

withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
  mpos <- getNextTokenPosition -- †
  r    <- p
  if f r
    then return r
    else do
      forM_ mpos setPosition
      fail msg

main :: IO ()
main = do
  let msg = "I only like numbers greater than 42!"
  parseTest' (withPredicate1 @Integer (> 42) msg L.decimal) "11"
  parseTest' (withPredicate2 @Integer (> 42) msg L.decimal) "22"

If I run it:

The next big Haskell project is about to start!
λ> :main
1:1:
  |
1 | 11
  | ^
I only like numbers greater than 42!
1:1:
  |
1 | 22
  | ^
I only like numbers greater than 42!
λ>

Try it for yourself! Works as expected.

† getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.

Parsec: error message at specific location

4 Answers