5
votes

I'm trying to modify the Data.Binary.PutM monad into a monad transformer. So I started by changin it's definition from

newtype PutM a = Put { unPut :: PairS a }

to

newtype PutM a = Put { unPut :: Identity (PairS a) }

Then of course I changed the implementations of return and >>= functions:

From:

return a = Put $ PairS a mempty
{-# INLINE return #-}

m >>= k  = Put $
    let PairS a w  = unPut m
        PairS b w1 = unPut (k a)
    in PairS b (w `mappend` w1)
{-# INLINE (>>=) #-}

m >> k  = Put $
    let PairS _ w  = unPut m
        PairS b w1 = unPut k
    in PairS b (w `mappend` w1)
{-# INLINE (>>) #-}

To:

return a = Put $! return $! PairS a mempty
{-# INLINE return #-}

m >>= k  = Put $!
    do PairS a w  <- unPut m
       PairS b w1 <- unPut (k a)
       return $! PairS b $! (w `mappend` w1)
{-# INLINE (>>=) #-}

m >> k  = Put $!
    do PairS _ w  <- unPut m
       PairS b w1 <- unPut k
       return $! PairS b $! (w `mappend` w1)
{-# INLINE (>>) #-}

As if the PutM monad was just a Writer monad. Unfortunately this (again) created a space leak. It is clear to me (or is it?) that ghc is postponing evaluation somewhere but I tried to put $! instead of $ everywhere as suggested by some tutorials but that did not help. Also, I'm not sure how the memory profiler is helpful if what it shows me is this:

Memory profile.

And for completeness, this is the memory profile I get when using the original Data.Binary.Put monad:

Original memory profile

If interested, here is the code I'm using to test it and the line I'm using to compile, run and create the memory profile is:

ghc -auto-all -fforce-recomp -O2 --make test5.hs && ./test5 +RTS -hT && hp2ps -c test5.hp && okular test5.ps

I hope I'm not annoying anyone by my saga of memory leak questions. I find there isn't many good resources on internet about this topic which leaves a newbye clueless.

Thanks for looking.

1
Hi Peter - I'm not convinced you have a "space leak" within Data.Binary i.e. some errant handle on data stopping it being garbage collected. I think why you are building a huge memory profile is because your data structure (a tree) doesn't stream - all of it has to be in memory (plus a similarly large output ByteString) until it has finished serializing. My intuition is the problem is the tree - not Data.Binary.stephen tetley
Hi @stephen, I forgot to mention that if I use the original Data.Binary.Put monad (the one without Identity in it) then it is streaming nicely (no noticable memory increase). My understanding is that if the memory was consumed purely by the tree structure, the memory increase would manifest itself in both cases.Peter Jankuliak
Could you send us some more code?fuz
@FUZxxl, This link points to code which I'm using to test it. To test original Data.Binary.Put redefine the USE_CASE to 0, to test the version with Identity in it, leave USE_CASE defined to 1.Peter Jankuliak
I could be wrong, but these lines - return $! PairS b $! (w mappend w1) - appear to be forcing more evaluation than the Data.Binary equivalent. As Data.Binary seems lazier it might have better production / consumption behaviour and hence can keep its memory profile lower.stephen tetley

1 Answers

7
votes

As stephen tetley pointed out in his comment, the problem here is in excessive strictness. If you just add some more laziness to your Identity sample (~(PairS b w') in your (>>) definition) you'll get the same constant memory run picture:

data PairS a = PairS a {-# UNPACK #-}!Builder

sndS :: PairS a -> Builder
sndS (PairS _ !b) = b

newtype PutM a = Put { unPut :: Identity (PairS a) }

type Put = PutM ()

instance Monad PutM where
    return a = Put $! return $! PairS a mempty
    {-# INLINE return #-}

    m >>= k  = Put $!
        do PairS a w  <- unPut m
           PairS b w' <- unPut (k a)
           return $! PairS b $! (w `mappend` w')
    {-# INLINE (>>=) #-}

    m >> k  = Put $!
        do PairS _ w  <- unPut m
           ~(PairS b w') <- unPut k
           return $! PairS b $! (w `mappend` w')
    {-# INLINE (>>) #-}

tell' :: Builder -> Put
tell' b = Put $! return $! PairS () b

runPut :: Put -> L.ByteString
runPut = toLazyByteString . sndS . runIdentity . unPut

You actually can use normal tuples here and $ instead of $!

PS Once again: the right answer is actually in stephen tetley comment. The thing is that your 1st example uses lazy let bindings for >> implementation, so the Tree is not forced to be built entirely and hence "is streamed". Your 2nd Identity example is strict, so my understanding is that the whole Tree gets built in memory before being processed. You can actually easily add strictness to the 1st example and observe how it starts 'hogging' memory:

m >> k  = Put $
          case unPut m of
            PairS _ w ->
                case unPut k of
                  PairS b w' ->
                      PairS b (w `mappend` w')