G80 keep the context for 768 threads per SM concurrently and interleaves their execution. This is the key difference between CPU and GPU. GPUs are deep-multithreaded processor hiding memory accesses of some threads by the computation from other threads. The latency of executing a thread is much higher that the CPU and GPU is optimized for thread throughput instead of thread latency. In comparison, CPUs use out-of-order speculative execution to reduce the execution delay of one thread. There are several technique used by GPUs to reduce thread scheduling overhead. For example, GPUs group threads in coarser schedulable element called warps of wavefront and execute threads of the warp over an SIMD. GPU threads are identical making them suitable choice for SIMD model. In the eye of the programmer, threads are executed in MIMD fashion and they are grouped in thread blocks to reduce communication overhead.
Threads employed in a CPU core are used to fill different execution units by dynamic scheduling. CPU threads are not necessarily at the same type. It means once a thread is busy with the floating point other threads may find ALU idle. Therefore, execution of these thread can be done concurrently. Multiple threads per core are maintained to fill different execution units effectively preventing idle units. However, dynamic scheduling is costly in term of power and energy consumption. Therefore, manufacturer use a few threads per CPU core.
In answer to second part of your question: Threads in GPUs are scheduled by hardware (per SM warp scheduler) and the OS and even driver do not affect the scheduling.