3
votes

I was thinking about lists in Haskell, and I thought in other languages, one doesn't use lists for everything. Sure, you might want to store a list if you need the values later on, but if it's just a one off, say iterating from [1..n], why use a list where all that's really needed is a variable that's incremented?

I also read about "list fusion" and noted that whilst Haskell compilers try to implement this optimization to eliminate intermediate lists, they often are unsuccessful, resulting in the garbage collector having to clean up lists which are only used once.

Also, if you're not careful one can easily share a list, which means the garbage collector doesn't clean it up, which can result in running out of memory with an algorithm which was previously design to run in constant space.

So I thought it would be best to avoid lists completely, at least when one doesn't actually want to "store" the list.

I then came across conduit, which says it is:

a solution to the streaming data problem, allowing for production, transformation, and consumption of streams of data in constant memory.

This sounded perfect. I know conduit is designed for IO problems with resource acquisition and release issues, but can one just use it as a drop in replacement for lists?

For example, could I do the following:

fold f3 $ take 10 $ map f2 $ unfold f1 init_value

And with a few appropriately placed type annotations, use conduits for the whole process instead of lists?

I was hoping that perhaps classy-prelude would allow such code, but I'm not sure. If it's possible, could someone give an example, say like the above?

2
your intuition about lists is incorrect, I think, and you may also be forgetting about lazy evaluation when you talk about "storing" a list. In any case the problem conduit is solving are those presented by lazy IO. also look at the vector package.jberryman

2 Answers

8
votes

List computations stream in constant memory in the same circumstances as they would for conduit. The presence or absence of intermediate data structures does not affect whether or not it runs in constant memory. All it changes is the efficiency and the size of the constant memory that it inhabits.

Do not expect conduit to run in less memory than the equivalent list computation. It should actually take more memory because conduit steps have a greater overhead than list cells. Also, conduit currently does not have stream fusion. Somebody did experiment with that some time ago, although that did not get incorporated into the library. Lists, on the other hand, can and do fuse in many circumstances to remove intermediate data structures.

The important thing to remember is that streaming does not necessarily imply deforestation (i.e. removal of intermediate data structures).

4
votes

conduit was definitely not designed for this kind of a use case, but it can in theory be used that way. I did so personally for the markdown package, where it was more convenient to have the extra conduit plumbing than to deal directly with lists.

If you put this together with classy-prelude-conduit, you can get some relatively simple code. And we could certainly add more exports to classy-prelude-conduit to better optimize for this use case. For now, here's an example following the basic gist of what you laid out above:

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
import ClassyPrelude.Conduit
import Data.Conduit.List (unfold, isolate)
import Data.Functor.Identity (runIdentity)

main = putStrLn
     $ runIdentity
     $ unfold f1 init_value
    $$ map f2
    =$ isolate 10
    =$ fold f3 ""

f1 :: (Int, Int) -> Maybe (Int, (Int, Int))
f1 (x, y) = Just (x, (y, x + y))

init_value = (1, 1)

f2 :: Int -> Text
f2 = show

f3 :: Text -> Text -> Text
f3 x y = x ++ y ++ "\n"