2
votes

While playing with Haskell and conduit, I came across a behavior that I have a hard time explaining. First let me list all the modules and language extensions that need to be loaded to reproduce my problem:

{-# LANGUAGE FlexibleContexts  #-}

import Conduit                         -- conduit-combinators
import Data.Csv                        -- cassava
import Data.Csv.Conduit                -- cassava-conduit
import qualified Data.ByteString as BS -- bytestring
import Data.Text (Text)                -- text
import Control.Monad.Except            -- mtl
import Data.Foldable

First I created the most general CSV parsing conduit:

pipeline :: (MonadError CsvParseError m, FromRecord a)
         => ConduitM BS.ByteString a m ()
pipeline = fromCsv defaultDecodeOptions NoHeader

Then, I wanted to output the number of elements in each row of my csv file - I know this is kind of silly and useless and that there are a billion other ways of doing this kind of things, but that was just a toy test.

So I opened GHCi and tried this:

ghci> :t pipeline .| mapC length

As expected, this did not work because the constraint FromRecord a doesn't guarantee that a is Foldable. So I defined the following conduit:

pipeline2 :: (MonadError CsvParseError m, FromField a)
          => ConduitM BS.ByteString [a] m ()
pipeline2 = fromCsv defaultDecodeOptions NoHeader

This is a legal definition because FromField a => FromField [a] is an instance of FromRecord according to the cassava documentation.

At this point, I am happy and hopeful because [] is an instance of Foldable. So, once again, I open GHCi, and I try:

ghci> :t pipeline2 .| mapC length

But I get:

<interactive>:1:1: error:
    • Could not deduce (FromField a0) arising from a use of ‘pipeline2’
      from the context: MonadError CsvParseError m
        bound by the inferred type of
                 it :: MonadError CsvParseError m => ConduitM BS.ByteString Int m ()
        at <interactive>:1:1
      The type variable ‘a0’ is ambiguous
      These potential instances exist:
        instance FromField a => FromField (Either Field a)
          -- Defined in ‘cassava-0.4.5.0:Data.Csv.Conversion’
        instance FromField BS.ByteString
          -- Defined in ‘cassava-0.4.5.0:Data.Csv.Conversion’
        instance FromField Integer
          -- Defined in ‘cassava-0.4.5.0:Data.Csv.Conversion’
        ...plus 9 others
        ...plus 11 instances involving out-of-scope types
        (use -fprint-potential-instances to see them all)
    • In the first argument of ‘(.|)’, namely ‘pipeline2’
      In the expression: pipeline2 .| mapC length

So my understanding is that my pipeline2 is not enough specified.

But now if I try to forge a trivial conduit with an (almost) identical type:

pipeline3 :: (MonadError CsvParseError m, FromField a)
          => ConduitM a [a] m ()
pipeline3 = awaitForever $ \x -> yield [x]

Again I open up GHCi and try:

ghci> :t pipeline3 .| mapC length

This time I get:

pipeline3 .| mapC length
  :: (FromField a, MonadError CsvParseError m) => ConduitM a Int m ()

So this time, GHCi understands that I don't have to specify even further the definition of pipeline3.

So my question: why is there a problem with pipeline2? is there a way to define the most generic "pipeline" without further specifying the type of the output of the conduit? I thought that a list of FromField objects would be enough.

It feels like I am missing an important point about typeclasses and how to compose functions, or here Conduit objects, in a polymorphic manner.

Thank you very much for your answers!

2

2 Answers

2
votes

pipeline3 is a conduit typed like ConduitM a [a] m () (ignoring the constraints for now). So when you map length over it you get ConduitM a Int m (); the a is still there in the first type parameter, and so the FromField a constraint can remain, waiting to be instantiated at usage sites.

pipeline2 is a conduit typed like ConduitM BS.ByteString [a] m (). Now if you map length over it you would get ConduitM BS.ByteString Int m (). There's no a anywhere that type, so the FromField a instance can't be chosen at usage sites. Instead it has to be chosen immediately. But nothing in pipeline2 .| mapC length says what a should be. That's why it's complaining that a is ambiguous.

As far as I can tell (not intimately familiar with conduits), that should be the only problem with your first definition as well. FromRecord doesn't guarantee Foldable, but it has instances that are Foldable; you just need to pin down the type being used because length won't do it. You could use an expression signature on pipeline when you use it, the TypeApplication extension, a less polymorphic definition (which doesn't need to be a reimplement like pipeline2; you could have pipeline' = pipeline if you had the right signature on pipeline').

2
votes

The error you got...

 • Could not deduce (FromField a0) arising from a use of ‘pipeline2’
  from the context: MonadError CsvParseError m
    bound by the inferred type of
             it :: MonadError CsvParseError m => ConduitM BS.ByteString Int m ()
    at <interactive>:1:1
  The type variable ‘a0’ is ambiguous

... says that a0 is ambiguous, which makes it impossible to figure out which instance of FromField should be used. What makes it ambiguous? The error message also mentions the inferred type of your expression:

it :: MonadError CsvParseError m => ConduitM BS.ByteString Int m ()

There is no a0 in this type. That leads to ambiguity, because there is no specialisation of this type that can specify the FromField instance -- there isn't enough material for the type checker to work with. In your third example, on the other hand...

pipeline3 .| mapC length
  :: (FromField a, MonadError CsvParseError m) => ConduitM a Int m ()

... the type of the field does show up in the overall type, and so the ambiguity is averted.

It is worth emphasising that there is nothing wrong with pipeline2 per se. The problem only arises because length eliminates useful information from the overall type. In contrast, this, for instance, works just fine:

GHCi> :t pipeline2 .| mapC id
pipeline2 .| mapC id
  :: (MonadError CsvParseError m, FromField a) =>
     ConduitM BS.ByteString [a] m ()

In order to use pipeline2 with length, you need to specify the type of the field through a type annotation:

GHCi> -- Arbitrary example.
GHCi> :t (pipeline2 :: MonadError CsvParseError m => ConduitM BS.ByteString [Int] m ()) .| mapC length
(pipeline2 :: MonadError CsvParseError m => ConduitM BS.ByteString [Int] m ()) .| mapC length
  :: MonadError CsvParseError m => ConduitM BS.ByteString Int m ()

Alternatives to the annotation include using the TypeApplications extension (credit to ben's answer for reminding me of that)...

GHCi> :set -XTypeApplications 
GHCi> :t pipeline2 @_ @Int .| mapC length
pipeline2 @_ @Int .| mapC length
  :: MonadError CsvParseError m => ConduitM BS.ByteString Int m ()

... and specifying the field type through a proxy argument.

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleContexts  #-}

import Data.Proxy
-- etc.

rowLength :: forall m a. (MonadError CsvParseError m, FromField a)
    => Proxy a -> ConduitM BS.ByteString Int m ()
rowLength _ = p2 .| mapC length
    where
    p2 :: (MonadError CsvParseError m, FromField a)
        => ConduitM BS.ByteString [a] m ()
    p2 = pipeline2
GHCi> :t rowLength (Proxy :: Proxy Int)
rowLength (Proxy :: Proxy Int)
  :: MonadError CsvParseError m => ConduitM BS.ByteString Int m ()