As an exercise, I'm implementing a parser for an exceedingly simple language defined in Haskell using the following GADT (the real grammar for my project involves many more expressions, but this extract is sufficient for the question):
data Expr a where
I :: Int -> Expr Int
Add :: [Expr Int] -> Expr Int
The parsing functions are as follows:
expr :: Parser (Expr Int)
expr = foldl1 mplus
[ lit
, add
]
lit :: Parser (Expr Int)
lit = I . read <$> some digit
add :: Parser (Expr Int)
add = do
i0 <- expr
is (== '+')
i1 <- expr
is <- many (is (== '+') *> expr)
pure (Add (i0:i1:is))
Due to the left-recursive nature of the expression grammar, when I attempt to parse something as simple as 1+1
using the expr
parser, the parser get stuck in an infinite loop.
I've seen examples of how to factor out left recursion across the web using a transformation from something like:
S -> S a | b
Into something like:
S -> b T
T -> a T
But I'm struggling with how to apply this to my parser.
For completeness, here is the code that actually implements the parser:
newtype Parser a = Parser
{ runParser :: String -> [(a, String)]
}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s ->
fmap (\(a, r) -> (f a, r)) (p s)
instance Applicative Parser where
pure a = Parser $ \s -> [(a, s)]
(<*>) (Parser f) (Parser p) = Parser $ \s ->
concat $ fmap (\(f', r) -> fmap (\(a, r') -> (f' a, r')) (p r)) (f >
instance Alternative Parser where
empty = Parser $ \s -> []
(<|>) (Parser a) (Parser b) = Parser $ \s ->
case a s of
(r:rs) -> (r:rs)
[] -> case b s of
(r:rs) -> (r:rs)
[] -> []
instance Monad Parser where
return = pure
(>>=) (Parser a) f = Parser $ \s ->
concat $ fmap (\(r, rs) -> runParser (f r) rs) (a s)
instance MonadPlus Parser where
mzero = empty
mplus (Parser a) (Parser b) = Parser $ \s -> a s ++ b s
char = Parser $ \case (c:cs) -> [(c, cs)]; [] -> []
is p = char >>= \c -> if p c then pure c else empty
digit = is isDigit
attoparsec
instead of rolling your own parsing framework. – dfeuermplus
is generally expected to match<|>
. – dfeuerattoparsec
to make sure they're the problem and not your framework. Just a thought. – dfeuer