Relation between Haskell Threads and OS Threads in the GHC

Question

I tried to find out how exactly are Haskell's threads (the ones spawned by forkIO) mapped to OS threads.

The first source of information which I found,

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html#g:11

specifies that all the lightweight threads are actually run on one OS thread and only when a Haskell thread blocks with a safe IO operation will the GHC runtime spawn a new OS thread to run the other Haskell threads so that the IO call doesn't block the entire program.

The second source of information comes from here,

http://www.haskell.org/ghc/docs/7.0.1/html/users_guide/using-smp.html

which clearly states that Haskell threads are mapped to a predefined number of pre-created OS threads in a balanced way. That means more or less, that if I have 80 lightweight threads and I passed in the option +RTS -N 8 when running my program then at least 8 OS threads will be created and each such thread will run 10 lightweight threads. On a machine with 8 CPU cores that would mean roughly 10 Haskell threads/core.

The second source of information seems to be the more accurate one and it is this exact behavior that I wish the GHC runtime would manifest when running a program compiled with the -threaded flag.

Can anyone confirm this ? And also, if the second version is the right one, what is the purpose of a bound thread - one which is spawned using forkOS - is it only for handling native code that uses thread-local data ?

Mikhail Glushenkov Mikhail Glushenkov · Accepted Answer · 2012-10-17T22:20:34

Programs compiled without -threaded use a single OS thread to run all Haskell threads. Foreign calls will block all running Haskell threads.

Programs compiled with -threaded can use multiple OS threads to run several Haskell threads in parallel (the number of OS threads can be controlled by the +RTS -N option). Foreign calls that are marked safe will not block other running Haskell threads (so it may be beneficial to use -threaded even with +RTS -N1 if you have several Haskell threads and issue foreign calls that can take a long time). Foreign calls that are marked unsafe are implemented as simple inline function calls in GHC and will block the OS thread they are called from.

Regarding your first source, it describes what happens when a foreign call is issued from the point of view of a single capability. A capability is defined as a virtual CPU for running Haskell code, and in threaded RTS corresponds to a collection of OS threads, only one of which is running Haskell code at any time (the other OS threads are used for making foreign calls without blocking Haskell threads). When a Haskell thread makes a safe foreign call, it is put on the list of suspended threads and the capability is given to a different Haskell thread.

A bound Haskell thread has a fixed associated OS thread for making foreign calls. An unbound thread has no associated OS thread: foreign calls from this thread may be made in any OS thread. Bound threads are used for interacting with libraries for which it matters which calls to the library are made from which OS thread, such as OpenGL, which stores its rendering context in the OS-thread-local-state.

More information can be found in the GHC manual and the following paper:

Extending the Haskell Foreign Function Interface with Concurrency

Simon Marlow, Simon Peyton Jones, and Wolfgang Thaller, Haskell'04

Relation between Haskell Threads and OS Threads in the GHC

1 Answers