How much does it cost for Haskell FFI to go into C and back?

Question

If I want to call more than one C function, each one depending on the result of the previous one, is it better to create a wrapper C function that handles the three calls? Will it cost the same as using Haskell FFI without converting types?

Suppose I have the following Haskell code:

foo :: CInt -> IO CInt
foo x = do
  a <- cfA x
  b <- cfB a
  c <- cfC c
  return c

Each function cf* is a C call.

Is it better, in terms of performance, to create a single C function like cfABC and make only one foreign call in Haskell?

int cfABC(int x) {
   int a, b, c;
   a = cfA(x);
   b = cfB(a);
   c = cfC(b);
   return c;
}

Haskell code:

foo :: CInt -> IO CInt
foo x = do
  c <- cfABC x
  return c

How to measure the performace cost of a C call from Haskell? Not the cost of the C function itself, but the cost of the "context-switching" from Haskell to C and back.

I'm not at all sure, but I found this blog post enlightening. If I interpret it correctly, foreign ccall unsafe (with unsafe being the key), is essentially as cheap as an inline C function call. However, great care has to be taken when using unsafe, and the safe variant (foreign ccall) costs more and involves taking locks. — gspr
@ThiagoNegri: I did some crude (non-Criterion) benchmarks to compare foreign ccall and foreign ccall unsafe. I have a C function that, given double x returns sin(x)*sin(x)*cos(x)/2.0. I compiled it with GCC 4.7.2 and -O2. The benchmark calls it with 100000000 different arguments from 0 to pi/2 and sums the results. With foreign ccall it ran in about 9.6 seconds, compared to 4.6 seconds for foreign ccall unsafe. Calling it from an actual C program gave a running time of 4.4-4.5 seconds. This gives you some idea, at least. The Haskell code was compiled with GHC 7.4.2. — gspr
@gspr: Forget I said anything. My knowledge of the FFI is insufficient. — Colin Woodbury

Daniel Fischer Daniel Fischer · Accepted Answer · 2013-01-25T18:32:58

The answer depends mostly on whether the foreign call is a safe or an unsafe call.

An unsafe C call is basically just a function call, so if there's no (nontrivial) type conversion, there are three function calls if you make three foreign calls, and between one and four when you write a wrapper in C, depending on how many of the component functions can be inlined when compiling the C, since a foreign call into C cannot be inlined by GHC. Such a function call is generally very cheap (it's just a copy of the arguments and a jump to the code), so the difference is small either way, the wrapper should be slightly slower when no C function can be inlined into the wrapper, and slightly faster when all can be inlined [and that was indeed the case in my benchmarking, +1.5ns resp. -3.5ns where the three foreign calls took about 12.7ns for everything just returning the argument]. If the functions do something nontrivial, the difference is negligible (and if they're not doing anything nontrivial, you'd probably better write them in Haskell to let GHC inline the code).

A safe C call involves saving some nontrivial amount of state, locking, possibly spawning a new OS thread, so that takes much longer. Then the small overhead of perhaps calling one function more in C is negligible compared to the cost of the foreign calls [unless passing the arguments requires an unusual amount of copying, many huge structs or so]. In my do-nothing benchmark

{-# LANGUAGE ForeignFunctionInterface #-}
module Main (main) where

import Criterion.Main
import Foreign.C.Types
import Control.Monad

foreign import ccall safe "funcs.h cfA" c_cfA :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfB" c_cfB :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfC" c_cfC :: CInt -> IO CInt
foreign import ccall safe "funcs.h cfABC" c_cfABC :: CInt -> IO CInt

wrap :: (CInt -> IO CInt) -> Int -> IO Int
wrap foo arg = fmap fromIntegral $ foo (fromIntegral arg)

cfabc = wrap c_cfABC

foo :: Int -> IO Int
foo = wrap (c_cfA >=> c_cfB >=> c_cfC)

main :: IO ()
main = defaultMain
            [ bench "three calls" $ foo 16
            , bench "single call" $ cfabc 16
            ]

where all the C functions just return the argument, the mean for the single wrapped call is a bit above 100ns [105-112], and for the three separate calls around 300ns [290-315].

So a safe c call takes roughly 100ns and usually, it is then faster to wrap them up into a single call. But still, if the called functions do something sufficiently nontrivial, the difference won't matter.

How much does it cost for Haskell FFI to go into C and back?

2 Answers