Many programming languages like C++ (posix library) and Java provide the ability to play around with user-level threads. However, if all these user-level threads run in a single kernel thread - you only get the illusion of multiprogramming or multiprocessing (if multiple processors available) right ? I mean- still all these threads run in the same kernel thread. Am I right in saying that? And, if yes, then how exactly do we plan to get performance improvement using user-level threads ?
EDIT: I guess performance would not really be possible on a many to one model (user level threads to kernel thread mapping). So in a many to many model, performance improvement is possible only if a kernel level thread forks off. So my question is, even if user level threads have low overhead, I cannot really envisage a performance improvement as great as scheduling kernel level threads.
EDIT2: Essentially this is what I am trying to get verified - "Assume a computer has 4 processors. Now, say my program is the only thing running - and has forked 4 threads each of which do completely independent things. Now, if the mapping is say one to one (user to kernel mapping), I can actually get a perfect 4 times speedup. However if say (for some reason) all 4 user threads map to the same kernel thread space - then there is no speedup because of multiprocessing. This is because, even though I have 4 user level threads - they run in the same kernel space and cannot be split across 4 cores.