4
votes

In writing a parser for a particular computational biology file format, I'm running into some trouble.

Here's my code:

betaLine = string "BETA " *> p_int <*> p_int  <*> p_int <*> p_int <*> p_direction <*> p_exposure <* eol

p_int = liftA (read :: String -> Int) (many (char ' ') *> many1 digit <* many (char ' '))

p_direction = liftA mkDirection (many (char ' ') *> dir <* many (char ' '))
            where dir = oneStringOf [ "1", "-1" ]

p_exposure = liftA (map mkExposure) (many (char ' ') *> many1 (oneOf "io") <* many (char ' '))

Now, if I comment out the definition for betaLine, everything compiles and I've successfully tested out the individual parsers p_int, p_direction, and p_exposure.

But, when that betaLine equation is present, I get a type error that I don't quite understand. Is my understanding of the applicative <*> wrong? Ultimately, I want this to return Int -> Int -> Int -> Int -> Direction -> ExposureList, which I'll then be able to give to the constructor for a BetaPair.

The type error is:

Couldn't match expected type `a5 -> a4 -> a3 -> a2 -> a1 -> a0'
            with actual type `Int'
Expected type: Text.Parsec.Prim.ParsecT
                 s0 u0 m0 (a5 -> a4 -> a3 -> a2 -> a1 -> a0)
  Actual type: Text.Parsec.Prim.ParsecT s0 u0 m0 Int
In the second argument of `(*>)', namely `p_int'
In the first argument of `(<*>)', namely `string "BETA " *> p_int'
1
This is actually a problem with the associativity and precedence of the Applicative combinators - in a nutshell you need to use parens. If I remember correctly, UU_Parsing - which pioneered the Applicative style - had different associativity / precedence for (*>) and (<*) so it could get by with minimal parens.stephen tetley
That seems likely, as if I chop it down to only having one *> and one <*> it compiles. But, I can't quite figure out where the parens should go.Noah Daniels
I think each use of (<*) or (*>) has to be parenthesized so the parser fragment in the third line would be something like this: ((many (char ' ') *> dir) <* many (char ' '))stephen tetley
The compiler is fine with that line as is; it's the betaLine definition that's problematic. But, putting parens around string "BETA " *> p_int and p_exposure <* eol does not fix it.Noah Daniels
Mea culpa - discount what I've said then. Note that the betaLine isn't applying what it has parsed to a constructor.stephen tetley

1 Answers

5
votes

tl;dr: you want this expression:

betaLine = string "BETA " *> (BetaPair <$> p_int <*> p_int  <*> p_int <*> p_int <*> p_direction <*> p_exposure) <* eol

Read why below.


Once again, this is partly a precedence issue. What your current line does:

string "BETA " *> p_int <*> p_int ...

... is that it creates a parser like this:

(string "BETA " *> p_int) <*> (p_int) ...

This is not the main issue, though, and as a matter of fact, the semantically wrong parser above would still yield the correct result, if the rest of the parser were correct. However, as you say, you have a slight misunderstanding about how <*> works. Its signature is:

(<*>) :: Applicative f => f (a -> b) -> f a -> f b

As you can see, the function should get a function wrapped in a functor as the first argument, which it then applies using the value wrapped in the functor in the second argument (thus the applicative functor). When you give it p_int as the first argument at the beginning of your function, it is a Parser Int and not a Parser (a -> b), so the types don't check.

And as a matter of fact, they cannot be made to type check if the goal is what you stated with your reasoning; you want betaLine to be a Parser (Int -> Int -> Int -> Int -> Direction -> ExposureList), but how would that help you? You get a function that takes 4 Ints, a Direction and ExposureList, and when you give that function to the constructor of a BetaPair, it is magically supposed to construct a BetaPair out of it? Remember that functions associate to the right, so if the BetaPair constructor has type:

Int -> Int -> Int -> Int -> Direction -> ExposureList -> BetaPair

... it doesn't mean the same thing as:

(Int -> Int -> Int -> Int -> Direction -> ExposureList) -> BetaPair

It actually means this:

Int -> (Int -> (Int -> (Int -> (Direction -> (ExposureList -> BetaPair)))))

You can instead make the betaLine be a Parser BetaPair, which would make more sense. You can use the <$> operator, which is a synonym for fmap (under the function arrow), which lets you lift your BetaPair constructor into the Parser functor, and then apply individual arguments to it using the applicative functor interface. The <$> function has this type:

(<$>) :: Functor f => (a -> b) -> f a -> f b

In this case, your first argument that you're lifting is the BetaPair constructor, which transforms the types a and b into the type components of the BetaPair "function", yielding this specific signature:

(<$>) :: (Int -> (Int -> (Int -> (Int -> (Direction -> (ExposureList -> BetaPair)))))) 
      -> f Int -> f (Int -> (Int -> (Direction -> (ExposureList -> BetaPair))))

As you can see, what the <$> will do here is to take a function as the left argument, and a value wrapped in a functor as the right argument, and apply the wrapped argument to the function.

As a simpler example, if you have f :: Int -> String, the following expression:

f <$> p_int

... will parse an integer, apply the function f with that integer as the argument, and wrap the result in the functor, so the expression above has type Parser String. The type of <$> in this position is:

(<$>) :: (Int -> String) -> Parser Int -> Parser String

So, using <$> applies the first argument to your constructor. So how do you deal with the other arguments? Well, this is where the <*> comes in, and I think that you understand from the type signature what it does: if you chain its use, it will successively apply one more argument to the function wrapped in the functor to the left, by unwrapping the functor to the right. So, for a simpler example again; say that you have a function g :: Int -> Int -> String and the following expression:

g <$> p_int <*> p_int

The g <$> p_int expression will apply the result of p_int to the first argument of g, so the type of that expression is Parser (Int -> String). The <*> then applies the next argument, with the specific type of <*> being:

(<*>) :: Parser (Int -> String) -> Parser Int -> Parser String

So, the type of the whole expression above is Parser String.

Equivalently, for your situation, you can let BetaPair be your g in this case, yielding this pattern:

BetaPair <$> one <*> parser <*> per <*> argument <*> to <*> betaPair

As mentioned above, the resulting parser is thus:

betaLine = string "BETA " *> (BetaPair <$> p_int <*> p_int  <*> p_int <*> p_int <*> p_direction <*> p_exposure) <* eol