10
votes

Suppose someone makes a program to play chess, or solve sudoku. In this kind of program it makes sense to have a tree structure representing game states.

This tree would be very large, "practically infinite". Which isn't by itself a problem as Haskell supports infinite data structures.

An familiar example of an infinite data structure:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Nodes are only allocated when first used, so the list takes finite memory. One may also iterate over an infinite list if they don't keep references to its head, allowing the garbage collector to collect its parts which are not needed anymore.

Back to the tree example - suppose one does some iteration over the tree, the tree nodes iterated over may not be freed if the root of the tree is still needed (for example in an iterative deepening search, the tree would be iterated over several times and so the root needs to be kept).

One possible solution for this problem that I thought of is using an "unmemo-monad".

I'll try to demonstrate what this monad is supposed to do using monadic lists:

import Control.Monad.ListT (ListT)  -- cabal install List
import Data.Copointed  -- cabal install pointed
import Data.List.Class
import Prelude hiding (enumFromTo)

nums :: ListT Unmemo Int  -- What is Unmemo?
nums = enumFromTo 0 1000000

main = print $ div (copoint (foldlL (+) 0 nums)) (copoint (lengthL nums))

Using nums :: [Int], the program would take a lot of memory as a reference to nums is needed by lengthL nums while it is being iterated over foldlL (+) 0 nums.

The purpose of Unmemo is to make the runtime not keep the nodes iterated over.

I attempted using ((->) ()) as Unmemo, but it yields the same results as nums :: [Int] does - the program uses a lot of memory, as evident by running it with +RTS -s.

Is there anyway to implement Unmemo that does what I want?

1
This doesn't seem like a job for a monad to me. I don't think the space leak is caused by what you think it's caused by.Dan Burton
@Dan: If I leave just one of the foldlL or lengthL, then there's no space leak. That makes me think that the problem is that GHC isn't collecting the space. What do you believe causes the leak?yairchu
Are you sure this isn't a problem with the full-laziness "optimization"? See e.g. comments.gmane.org/gmane.comp.lang.haskell.cafe/103616Tom Ellis
There is a way using Joachim Breitner's hackage.haskell.org/package/ghc-dup (which is just a proof of concept). I have a blogpost haskellexists.blogspot.de/2016/01/… that elaborates.phischu

1 Answers

4
votes

Same trick as with a stream -- don't capture the remainder directly, but instead capture a value and a function which yields a remainder. You can add memoization on top of this as necessary.

data UTree a = Leaf a | Branch a (a -> [UTree a]) 

I'm not in the mood to figure it out precisely at the moment, but this structure arises, I'm sure, naturally as the cofree comonad over a fairly straightforward functor.

Edit

Found it: http://hackage.haskell.org/packages/archive/comonad-transformers/1.6.3/doc/html/Control-Comonad-Trans-Stream.html

Or this is perhaps simpler to understand: http://hackage.haskell.org/packages/archive/streams/0.7.2/doc/html/Data-Stream-Branching.html

In either case, the trick is that your f can be chosen to be something like data N s a = N (s -> (s,[a])) for an appropriate s (s being the type of your state parameter of the stream -- the seed of your unfold, if you will). That might not be exactly correct, but something close should do...

But of course for real work, you can scrap all this and just write the datatype directly as above.

Edit 2

The below code illustrates how this can prevent sharing. Note that even in the version without sharing, there are humps in the profile indicating that the sum and length calls aren't running in constant space. I'd imagine that we'd need an explicit strict accumulation to knock those down.

{-# LANGUAGE DeriveFunctor #-}
import Data.Stream.Branching(Stream(..))
import qualified Data.Stream.Branching as S
import Control.Arrow
import Control.Applicative
import Data.List

data UM s a = UM (s -> Maybe a) deriving Functor
type UStream s a = Stream (UM s) a

runUM s (UM f) = f s
liftUM x = UM $ const (Just x)
nullUM = UM $ const Nothing

buildUStream :: Int -> Int -> Stream (UM ()) Int
buildUStream start end = S.unfold (\x -> (x, go x)) start
    where go x
           | x < end = liftUM (x + 1)
           | otherwise = nullUM

sumUS :: Stream (UM ()) Int -> Int
sumUS x = S.head $ S.scanr (\x us -> maybe 0 id (runUM () us) + x) x

lengthUS :: Stream (UM ()) Int -> Int
lengthUS x = S.head $ S.scanr (\x us -> maybe 0 id (runUM () us) + 1) x

sumUS' :: Stream (UM ()) Int -> Int
sumUS' x = last $ usToList $ liftUM $ S.scanl (+) 0  x

lengthUS' :: Stream (UM ()) Int -> Int
lengthUS' x = last $ usToList $ liftUM $ S.scanl (\acc _ -> acc + 1) 0 x

usToList x = unfoldr (\um -> (S.head &&& S.tail) <$> runUM () um) x

maxNum = 1000000
nums = buildUStream 0 maxNum

numsL :: [Int]
numsL = [0..maxNum]

-- All these need to be run with increased stack to avoid an overflow.

-- This generates an hp file with two humps (i.e. the list is not shared)
main = print $ div (fromIntegral $ sumUS' nums) (fromIntegral $ lengthUS' nums)

-- This generates an hp file as above, and uses somewhat less memory, at the cost of
-- an increased number of GCs. -H helps a lot with that.
-- main = print $ div (fromIntegral $ sumUS nums) (fromIntegral $ lengthUS nums)

-- This generates an hp file with one hump (i.e. the list is shared)
-- main = print $ div (fromIntegral $ sum $ numsL) (fromIntegral $ length $ numsL)