I tried to find out how exactly are Haskell's threads (the ones spawned by forkIO) mapped to OS threads.
The first source of information which I found,
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html#g:11
specifies that all the lightweight threads are actually run on one OS thread and only when a Haskell thread blocks with a safe IO operation will the GHC runtime spawn a new OS thread to run the other Haskell threads so that the IO call doesn't block the entire program.
The second source of information comes from here,
http://www.haskell.org/ghc/docs/7.0.1/html/users_guide/using-smp.html
which clearly states that Haskell threads are mapped to a predefined number of pre-created OS threads in a balanced way. That means more or less, that if I have 80 lightweight threads and I passed in the option +RTS -N 8 when running my program then at least 8 OS threads will be created and each such thread will run 10 lightweight threads. On a machine with 8 CPU cores that would mean roughly 10 Haskell threads/core.
The second source of information seems to be the more accurate one and it is this exact behavior that I wish the GHC runtime would manifest when running a program compiled with the -threaded flag.
Can anyone confirm this ? And also, if the second version is the right one, what is the purpose of a bound thread - one which is spawned using forkOS - is it only for handling native code that uses thread-local data ?