1
votes

I have a logical language defined by the following BNF.

 formula ::= true
           | false
           | var
           | formula & formula
           | [binder] formula

 binder  ::= var 
           | $var

Essentially, this allows for formulae such as x & true, [x]x and [$x](x & true). The semantics are not important here; but the essential thing is that I have these square bracketed expressions appearing before formulae, and inside those square bracketed expressions, identifiers may or may not be preceded by a dollar sign ($). Now I have used Haskell's Parsec library to help me construct a parser for this language, detailed below.

module LogicParser where

import System.IO
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token

-- Data Structures
data Formula = LVar String
             | TT
             | FF
             | And Formula Formula
             | Bound Binder Formula
             deriving Show

data Binder = BVar String
             | FVar String
             deriving Show

-- Language Definition
lang :: LanguageDef st
lang =
    emptyDef{ Token.identStart      = letter
            , Token.identLetter     = alphaNum
            , Token.reservedOpNames = ["&", "$", "[", "]"]
            , Token.reservedNames   = ["tt", "ff"]
            }


-- Lexer for langauge
lexer = 
    Token.makeTokenParser lang


-- Trivial Parsers
identifier     = Token.identifier lexer
keyword        = Token.reserved lexer
op             = Token.reservedOp lexer
roundBrackets  = Token.parens lexer
whiteSpace     = Token.whiteSpace lexer

-- Main Parser, takes care of trailing whitespaces
formulaParser :: Parser Formula
formulaParser = whiteSpace >> formula

-- Parsing Formulas
formula :: Parser Formula
formula = andFormula
        <|> formulaTerm

-- Term in a Formula
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
            <|> ttFormula
            <|> ffFormula
            <|> lvarFormula
            <|> boundFormula

-- Conjunction
andFormula :: Parser Formula
andFormula = 
    buildExpressionParser [[Infix (op "&" >> return And) AssocLeft]] formulaTerm

-- Bound Formula
boundFormula :: Parser Formula
boundFormula = 
    do  op "["
        v <- var
        op "]"
        f <- formulaTerm
        return $ Bound v f

-- Truth
ttFormula :: Parser Formula
ttFormula = keyword "tt" >> return TT

-- Falsehood
ffFormula :: Parser Formula
ffFormula = keyword "ff" >> return FF

-- Logical Variable
lvarFormula :: Parser Formula
lvarFormula =
    do  v <- identifier
        return $ LVar v

-- Variable
var :: Parser Binder
var = try bvar <|> fvar

-- Bound Variable
bvar :: Parser Binder
bvar =
    do  op "$"
        v <- identifier
        return $ BVar v

-- Free Variable
fvar :: Parser Binder
fvar =
    do  v <- identifier
        return $ FVar v


-- For testing
main :: IO ()
main = interact (unlines . (map stringParser) . lines)

stringParser :: String -> String
stringParser s = 
    case ret of
        Left e ->  "Error: " ++ (show e)
        Right n -> "Interpreted as: " ++ (show n)
    where 
        ret = parse formulaParser "" s

My issue is the following. When the dollar sign operator ($) touches the square bracket, I get an error, whereas if I add a space, the parser works fine:

enter image description here

How can I get the parser to recognise [$x](x & true)? Note that it has no issue with & touching its operands, only when the two operators [ and $ touch.

3

3 Answers

1
votes

I am not very familiar with the token side of Parsec, but from its documentation I think you need to provide opLetter and possibly opStart, since you are providing reservedOp:

opLetter :: ParsecT s u m Char    

This parser should accept any legal tail characters of operators. Note that this parser should even be defined if the language doesn't support user-defined operators, or otherwise the reservedOp parser won't work correctly.

In particular the default opLetter doesn't include [ or ], so it is behaving erratically when you think one of those should be an op.

1
votes

I think it doesn't like you having the square brackets as operators. I would try this:

  1. remove "[" and "]" from Token.reservedOpNames
  2. add another parser to your Trivial Parsers: squareBrackets = Token.brackets lexer
  3. change your boundFormula function to:

    boundFormula =
        do v <- squareBrackets var
           f <- formulaTerm
           return $ Bound v f
    
1
votes

Here's how I would write your parser with Megaparsec (docs, tutorial):

import Text.Megaparsec
import qualified Text.Megaparsec.Char as C
import qualified Text.Megaparsec.Char.Lexer as L
import Control.Monad.Combinators.Expr
import Data.Void

type Parser = Parsec Void String

data Formula = LVar String
             | TT
             | FF
             | Not Formula -- Added to demonstrate `Prefix` of `makeExprParser`
             | And Formula Formula
             | Bound Binder Formula
             deriving Show

data Binder = BVar String
            | FVar String
            deriving Show

Megaparsec also has makeExprParser, but it's moved into a separate parser-combinators package:

formula :: Parser Formula
formula = makeExprParser term operators
  where
    term = choice
      [ TT <$ symbol "true"
      , FF <$ symbol "false"
      , LVar <$> var
      , Bound <$> brackets binder <*> parens formula
      ]

    binder = choice
      [ BVar <$> (C.char '$' >> var)
      , FVar <$> var
      ]

    var = lexeme $ some C.letterChar

    operators :: [[Operator Parser Formula]]
    operators =
      [ [ Prefix (Not <$ symbol "¬") ]
      , [ InfixL (And <$ symbol "&") ]
      ]

Some points:

  • Use applicative style (<$>, <$, $>) when possible.
  • Megaparsec renamed some combinators like many1 to some. There's a tutorial, Switch from Parsec to Megaparsec that aids the transition. (It might be a little outdated; I sent a PR in the process of answering this question.)
  • You don't need try when BVars and FVars are mutually exclusive on the first character, $. Simply listing the BVar parser first is enough. Since you won't find any other valid expression that begins with $, a failed parser that consumed a $ is not a problem.
  • Your grammar doesn't say anything about literal parentheses or literal parentheses after brackets. So in order to parse "[$x](x & true)" you need to add explicit parentheses to the grammar, either as

    formula ::= ... | '(' formula ')'
    

    or as

    formula ::= ... | '[' binder ']' '(' formula ')'
    

    if they're only allowed there. I've gone with the latter, but this is probably wrong.

Continuing,

lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer

symbol :: String -> Parser String
symbol = L.symbol spaceConsumer

spaceConsumer :: Parser ()
spaceConsumer = L.space C.space1 empty empty

brackets, parens :: Parser a -> Parser a
brackets = between (symbol "[") (symbol "]")
parens = between (symbol "(") (symbol ")")

Some last points,

  • Use between to wrap a parser in brackets or braces.
  • Parsec token parsers are a little complicated because they require you to specify what a token is and what whitespace is and so on. Megaparsec lexeme parsers have some of that complexity, but you can leave out the parts that are not relevant (e.g. line comments and block comments) by using empty :: Alternative f => f a.
  • Whitespace in parser combinators is tricky. Make sure that all parsers are either lexeme parsers (e.g. symbol "foo", lexeme $ some C.letterChar) or combinations of lexeme parsers. If you use lexeme on something that already is a lexeme parser, you're doing something wrong, and if you forget to use lexeme on something, you'll risk disallowing whitespace in places where you want to allow it.

    I've not used lexeme around C.char '$'.

  • There are always corner cases, e.g. whitespace at the beginning:

    > parseTest formula " [$x](x & true) "
    1:1:
      |
    1 |  [$x](x & true) 
      | ^^^^^
    unexpected " [$x]"
    expecting "false", "true", '[', '¬', or letter
    

    If you want to thoroughly assert that your parser allows whitespace in all the right places, you can build an "ugly-printer" that, for arbitrary syntax trees, inserts arbitrary whitespace where allowed. Your property is then that parsing an ugly-printed syntax tree should yield the same syntax tree.

  • Megaparsec parse errors look super nice.

    They can look nicer if you use the <?> operator (also available in Parsec).