Implementation for the "wrapper" wrapper in Haskell FFI

Question

According to the Haskell Wiki, in Haskell FFI, we can make a C function out of a Haskell function with the "wrapper" wrapper:

foreign import ccall "wrapper" createFunPtr :: (Int -> Int) -> IO (FunPtr (Int -> Int))

If I understood this correctly, this means f <- createFunPtr (+ 42) in a do-block gives a function pointer f of type int (*)(int) in C.

In Haskell, a function of type Int -> Int may have some local bindings inside (e.g. a lambda expression may have references to the variables in outer scopes), while in C, function pointers are merely memory addresses to functions, and calling those function pointers is just a something similar to a raw jump. So there is no other place for the additional data of the Haskell function to live in FunPtr.

Lambda expressions in C++ are objects, and invoking the operator() passes an implicit this pointer. But FunPtrs are treated just like normal function pointers in C, so there is no possibility to pass some extra arguments.

So how did GHC implement this "wrapper" wrapper? (I guessed that it might be implemented by directly writing instructions to the code section in memory to pass the extra arguments, but as I recall, the code section is usually read-only.)

In standard C this would be impossible, but on many OSs there are ways to generate code at runtime and obtain a valid function pointer to it. This is how dynamic libraries work, roughly. Probably GHC generates machine code with the hardcoded pointers to the captured variables. Also, here's a ugly proof of concept. — chi
My guess would be that it creates direct call pointers for stateless functions, and for closures either generates them on-the-fly (which is possible, you can create a new buffer for code in a running program - that's how JIT works), or precomputes their types and uses some sort of a trampoline to jump to a particular closure. — Bartek Banachewicz

HTNW HTNW · Accepted Answer · 2019-07-22T21:32:34

A quick Google search brings up the GHC commentary:

Occasionally, it is convenient to treat Haskell closures as C function pointers. This is useful, for example, if we want to install Haskell callbacks in an existing C library. This functionality is implemented with the aid of adjustor thunks.

An adjustor thunk is a dynamically allocated code snippet that allows Haskell closures to be viewed as C function pointers.

Stable pointers provide a way for the outside world to get access to, and evaluate, Haskell heap objects, with the RTS providing a small range of ops for doing so. So, assuming we've got a stable pointer in our hand in C, we can jump into the Haskell world and evaluate a callback procedure, say. This works OK in some cases where callbacks are used, but does require the external code to know about stable pointers and how to deal with them. We'd like to hide the Haskell-nature of a callback and have it be invoked just like any other C function pointer.

Enter adjustor thunks. An adjustor thunk is a little piece of code that's generated on-the-fly (one per Haskell closure being exported) that, when entered using some 'universal' calling convention (e.g., the C calling convention on platform X), pushes an implicit stable pointer (to the Haskell callback) before calling another (static) C function stub which takes care of entering the Haskell code via its stable pointer.

An adjustor thunk is allocated on the C heap, and is called from within Haskell just before handing out the function pointer to the Haskell (IO) action. User code should never have to invoke it explicitly.

An adjustor thunk differs from a C function pointer in one respect: when the code is through with it, it has to be freed in order to release Haskell and C resources. Failure to do so will result in memory leaks on both the C and Haskell side.

I recall reading, somewhere, that the wrapper FFI imports are actually the only place where GHC performs runtime code generation.

I believe what the commentary is saying is that your createFunPtr is defined, at compile time, to be something like this (I set -ddump-simpl to get the Core for createFunPtr, and the following is my attempt at decompiling it back to Haskell)

createFunPtr fun = do stable <- newStablePtr fun
                      pkg_ccall stable :: IO (FunPtr (Int -> Int))

newStablePtr is part of the StablePtr API, which allows Haskell to export references to Haskell objects to foreign code. The GC is allowed to move the function passed to createFunPtr after the adjustor thunk has been created. Therefore, said adjustor needs a reference to the function that still holds after a GC, and that functionality is provided by stable pointers.

pkg_ccall (which is actually rather magic) allocates space for the adjustor thunk on the C heap. This space has to be later freed with freeHaskellFunPtr, otherwise memory leaks on both the C heap (which holds the adjustor) and the Haskell heap (which holds the function closure, which cannot be GC'd until the stable pointer is released). The adjustor's contents depend on the platform and whether GHC was configured (at build time) to use libffi for adjustors. The actual assembly code can be found in the comments in the relevant RTS file, but the gist is generally:

int adjustor(int arg) {
  return zdmainzdTzdTzucreateAddPtr($stable, arg);
  // with stable "baked" into each adjustor, as a "push <constant>" instruction
}

zdmainzdTzdTzucreateAddPtr is the stub that dereferences the given stable pointer and calls the Haskell function there produced. It's static, baked into the binary, and is vaguely equivalent to this: (If you pass GHC -v and -keep-tmp-files, you should be able to find the ghc_<some_num>.c file that contains the real definition, which needs to do some bookkeeping.)

HsInt zdmainzdTzdTzucreateAddPtr(StgStablePtr ptr, HsInt a) {
  HaskellObj fun, arg, app, ret;
  fun = deRefStablePtr(ptr);
  arg = rts_mkInt(a);
  app = rts_apply(fun, arg);
  eval(app, &ret);
  return rts_getInt(ret);
}

Implementation for the "wrapper" wrapper in Haskell FFI

1 Answers