3
votes

I'am writing some code to parse commands from the Simple Imperative Language defined in Theory of Programming Languages (Reynolds, 1998).

I have a lexer module that given a string extracts the tokens from it if it's a valid language expression and then I pass that list of tokens to the parser which should build an internal representation of the command (defined as an algebraic data type).

These are my Tokens:

--Tokens for the parser
data Token = Kw Keyword
| Num Int
| Op Operator
| Str String
| Sym Symbol
deriving Show 

I'm having trouble with binary operators. I'll put as an example the sum, but it happens the same with all of them, either boolean or integers.

For example if I'd run the program parse "x:=2+3" I should get the following list of tokens from the lexer
[Str "x", Op Colon, Op Equal, Num 2, OP, Plus, Num 3]
which is actually what I'm getting.

But then the parser should return the command
Assign "x" (Ibin Plus (Const 2) (Const 3)
which is the correct representation of the command. But instead of that I'm getting the following representation:
Assign "x" (Const 2)

I guess that I screwed it at some point in the pIntExpr function because the variable identifier and the := of the assignment are parsed OK and it's not parsing the last elements. Here are the relevant parsers for this example, to see if someone can orientate me in what I'm doing wrong.

-- Integer expressions
data IntExpr = Const Int 
         | Var Iden --Iden=String
         | Neg IntExpr
         | IBin OpInt IntExpr IntExpr
         deriving Show

type TParser = Parsec [Token] ()

--Internal representation of the commands
data Comm = Skip 
      | Assign Iden IntExpr
      | If Assert Comm Comm
      | Seq Comm Comm
      | While Assert Comm
      | Newvar Iden IntExpr Comm
        deriving Show

--Parser for non sequential commands
pComm' :: TParser Comm
pComm' = choice [pif,pskip,pAssign,pwhile,pNewVar]

--Parser for the assignment command
pAssign :: TParser Comm
pAssign = do v <- pvar
             _ <- silentOp Colon
             _ <- silentOp Equal
             e <- pIntExp
             return $ Assign v e 

-- Integer expressions parser
pIntExp :: TParser IntExpr
pIntExp = choice [ var'  --An intexp is either a variable
                  , num  --Or a numeric constant
                  , pMul --Or <intexp>x<intexp>
                  , pSum --Or <intexp>+<intexp>
                  , pRes --Or <intexp>-<intexp>
                  , pDiv --Division
                  , pMod --Modulus
                  , pNeg --Unary "-"
                 ]

-- Parser for <intexp>+<intexp>
pSum :: TParser IntExpr
pSum = do 
    e <- pIntExp
    _ <- silentOp Lexer.Plus
    e' <- pIntExp
    return $ IBin Lang.Plus e e'

UPDATE TAKING INTO ACCOUNT AndrewC's ANSWER
Unfortunately moving the var' parser down in the choice list didn't work, it yields the same result. But I took AndrewC's answer into account and tried to "manually" trace the execution (I'm not familiar with ghci's debugger and ended up doing lot of single steps and got lost eventually). This is how I reason it:

I got this token list from the lexer: [Str "x", Op Colon, Op Equal, Num 2, OP Plus, Num 3]

So, the pComm' parser fails with pif and pskip, but succeds with pAssign, consuming Str "x", Op Colon and Op Equal and trying to parse [Num 2, OP Plus, Num 3] with pIntExp (!!)

The pIntExp parser then tries the var' parser and fails, but succeds with the num parser consuming the Num 2 token and therefore returning the erroneous result Assign "x" (Const 2).

So with AndrewC's advice in mind about choice, I moved num parser down in the list too. For the sake of simplicity I'll consider pIntExp as choice [pSum, num, var´] that it's what's relevant for this particular example.

The first part of the reasoning remains the same. So I'll restart from (!!) where we had
[Num 2, Op Plus, Num 3] to be parsed by pIntExp
pIntExp tries now first with pSum, which in turn "calls" pIntExp again, which will try pSum again, and so the program hangs. I tried it and it indeed hangs and never ends.

So I was wondering if there's a form to make the pSum parser "lookahead" for the Op Plus token and then parse the corresponding expressions?

UPDATE 2: After "googling" a little bit more now that I've identified the problem I found that the combinational parsers chainl1 and/or chainl might be just what I need. I'll be playing with these and if I work it out post the solution

1

1 Answers

4
votes

The choice function tries the parser it's given in the order they are in the list.

Since yoiur parser for variables appears before your parser for the more complicated addition expression, it suceeds before the other is tried.

To solve this problem, put the variable parser after any expressions that start with a variable (and think through any other substring-matching issues when using choice.

Similar problems incude 3 - 4 + 1 evaluating to -2. People expect left association in the absence of other priorities (so sum - term instead of term - sum).

You also might not want 1 + 10 * 5 to eveluate to 55, so you'll have to be careful around + and * etc if you want to implement operator precedence. You can achieve this by parsing an expression made up of multiplication as a term and then an additive expression as a sum of terms.