1
votes

I want to parse a language like this

foo = (bar, bar1 = (bar2 = bar4), bar5)

I wrote a simple parser

module SimpleParser where
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)
import Text.Parsec
import qualified Text.Parsec.Token as Tok
import Text.Parsec.Char
import Prelude

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
  where
    style = emptyDef {
              Tok.identLetter    = alphaNum
             }

parens :: Parser a -> Parser a
parens = Tok.parens lexer

commaSep :: Parser a -> Parser [a]
commaSep = Tok.commaSep lexer

identifier :: Parser String
identifier = Tok.identifier lexer

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

data Expr = Ident String | Label String Expr | ExprList [Expr] deriving (Eq, Ord, Show)

parseExpr :: String -> Either ParseError Expr
parseExpr s = parse expr "" s

expr :: Parser Expr
expr = parens expr
      <|> try exprList
      <|> ident
ident :: Parser Expr
ident = do
  var <- identifier
  return $ Ident var

exprLabel :: Parser Expr
exprLabel = do
  var <- identifier
  reservedOp "="
  body <- expr
  return $ Label var body

exprList :: Parser Expr
exprList = do
  list <- commaSep (try exprLabel <|> expr)
  return $ ExprList list

But even with the following simple input it has an infinite loop:

test = parseExpr "foo = bar"

Could someone explain why it doesn't work and how I can fix it?

1
This kind of nasty behaviour is usually related to left recursion on grammars. I do not have enough time to check, but I believe that using chainl or chainr could help. Later, I'll check on your code and try to fix.Rodrigo Ribeiro

1 Answers

3
votes

Thing is, in your code, exprList will loop if it tries to parse an identifier, that is parse exprList "" "foo" goes into an infinite loop. This is because it tries to parse it as a list of either Labels or expressions, and expressions can be lists. Once it fails to be a exprLabel it tries to see if it can be a expr and so it calls exprList again.

To fix it you need to make sure that expr checks to see both if it is a exprLabel or an identifier before it tries exprList. Note that if all the above fails it will still go into a loop. This is because it doesn't know if this is just the start of a list (or a list of lists of lists of lists...) or not.

To fix this you can make expr only call exprList if it matches a parens, and use exprList as the starting Parser.

expr :: Parser Expr
expr = parens (exprList)
      <|> try exprLabel
      <|> ident

exprList :: Parser Expr
exprList = do
  list <- commaSep expr
  return $ ExprList list

And it works like this:

>parse exprList  "" "(foo=bar),foo=bar"
  Right (ExprList [ExprList [Label "foo" (Ident "bar")],Label "foo" (Ident "bar")])