23
votes

The introduction

The following code shows that when using runhaskell Haskell Garbage Collector releases the memory, when a is no longer used. It results in core dump while releasing variable a - for a purpose, to inspect the behaviour - a has got nullFunPtr as a finalizer.

module Main where

import Foreign.Ptr 
import Foreign.ForeignPtr


main :: IO ()
main = do
    a <- newForeignPtr nullFunPtr nullPtr
    putStrLn "Hello World"

The problem

When running the same in ghci it does not release memory. How can I force ghci to release no longer used variables?

$ ghci
> import Foreign.Ptr
> import Foreign.ForeignPtr
> import System.Mem
> a <- newForeignPtr nullFunPtr nullPtr
> a <- return () -- rebinding variable a to show gc that I'm no longer using it
> performGC
> -- did not crash - GC didn't release memory
> ^D
Leaving GHCi.
[1]    4396 segmentation fault (core dumped)  ghci

Memory was released on exit, but this is too late for me. I'm extending GHCi and using it for other purpose and I need to release the memory earlier - on demand or as fast as possible would be really great.

I know that I can call finalizeForeignPtr, but I'm using foreignPtr just for debug purposes. How can I release a in general in last example?

If there is no possibility to do it with ghci prompt, I can also modify ghci code. Maybe I can release this a by modyfing ghci Interactive Context or DynFlags? So far I've got no luck with my reaserch.

1
Are you sure the memory is not released? I don't think there's a guarantee that finalizers run promptly when a variable is GC'd. - Daniel Wagner
Rather yes, I did similar tests with large arrays and monitoring it with ekg. Nothing was released. - remdezx
Why should be a garbage collected after reassignment to () ? How could ghci know (from inside a kind of IO monad) it won't be needed ? - David Unric
If end of scope can be determined, it would work as you may expect, without pointless 'rebinding'. GHCi> let testNull = do { a <- newForeignPtr nullFunPtr nullPtr; return () } GHCi> performGC results in an immediate SIGSEGV. - David Unric
I thought that if variable is rebound, the old value will be released. Using scopes is nice idea, but unfortunately in my code I have many variables like this a and I cannot release only few of them using scoping like that... - remdezx

1 Answers

9
votes

Tracing through the code we find that the value is stored in the field closure_env of the data type PersistentLinkerState, which is a ClosureEnv, i.e. a mapping from name to HValues. The relevant function in Linker.hs is

extendLinkEnv :: [(Name,HValue)] -> IO ()
-- Automatically discards shadowed bindings
extendLinkEnv new_bindings =
  modifyPLS_ $ \pls ->
    let new_closure_env = extendClosureEnv (closure_env pls) new_bindings
    in return pls{ closure_env = new_closure_env }

and although the comment indicates that it should remove the shadowed binding, it does not, at least not the way you want it to.

The reason is, as AndrewC writes correctly: Although both variables have the same source code name, they are different to the compiler (they have a different Unique attached). We can observe this after adding some tracing to the function above:

*GHCiGC> a <- newForeignPtr nullFunPtr nullPtr
extendLinkEnv [a_azp]
*GHCiGC> a <- return ()
extendLinkEnv [a_aF0]
*GHCiGC> performGC
extendLinkEnv [it_aFL]

Removing bindings with the same source-name at this point should solve your GC problem, but I don’t know the compiler well enough to tell what else would break. I suggest you open a ticket, hopefully someone will know.

Confusion on binding vs. value

In the comments there seems to be some confusion about bindings and values. Consider this code:

> a <- return something
> b <- return somethingelse
> a <- return (b+b)
> b <- return anewthing

With the current implementation, the heap will consist of `

  • something
  • somethingelse
  • a thunk referencing the (+) operator and somethingelse
  • anewthing.

Furthermore the environment of the interpreter has references to all four heap values, so nothing can be GC’ed.

What remdezx rightly expected is that GHCi would drop the reference to something and somethingelse. This, in turn, would allow the run time system to garbage collect something (we assume no further references). GHCi still references the thunk, which in turn references somethingelse, so this would not be garbage collected.

Clearly the question was very implementation specific, and so is this answer :-)