4
votes

I am new to cuda programming and am reading about a G80 chip which has 128 SPs(16 SMs, each with 8 SPs) from the book "Programming Massively Parallel Processors - A hands on approach". There is a comparison between Intel CPUs and G80 chip. Intel CPUs support 2 to 4 threads, depending on the machine model, per core. where as the G80 chip supports 768 threads per SM, which sums up to 12000 threads for this chip.

My question here is it that the G80 chip can execute 768 threads simultaneously ? If not simultaneously then what is meant by Intel CPUs support 2 to 4 threads per core ? We can always have many threads/processes running on the Intel CPU scheduled by the OS.

3

3 Answers

5
votes

G80 keep the context for 768 threads per SM concurrently and interleaves their execution. This is the key difference between CPU and GPU. GPUs are deep-multithreaded processor hiding memory accesses of some threads by the computation from other threads. The latency of executing a thread is much higher that the CPU and GPU is optimized for thread throughput instead of thread latency. In comparison, CPUs use out-of-order speculative execution to reduce the execution delay of one thread. There are several technique used by GPUs to reduce thread scheduling overhead. For example, GPUs group threads in coarser schedulable element called warps of wavefront and execute threads of the warp over an SIMD. GPU threads are identical making them suitable choice for SIMD model. In the eye of the programmer, threads are executed in MIMD fashion and they are grouped in thread blocks to reduce communication overhead.

Threads employed in a CPU core are used to fill different execution units by dynamic scheduling. CPU threads are not necessarily at the same type. It means once a thread is busy with the floating point other threads may find ALU idle. Therefore, execution of these thread can be done concurrently. Multiple threads per core are maintained to fill different execution units effectively preventing idle units. However, dynamic scheduling is costly in term of power and energy consumption. Therefore, manufacturer use a few threads per CPU core.

In answer to second part of your question: Threads in GPUs are scheduled by hardware (per SM warp scheduler) and the OS and even driver do not affect the scheduling.

1
votes

As far as I know, 768 is the max number of resident threads in an SM. And the threads are executed in warps which consists of 32 threads. So in an SM, all 768 threads will not be executed at the same time, but they will be scheduled in chunks of 32 threads at a time, i.e. one warp at a time.

1
votes

The analogous technology on CPUs is called "simultanous multithreading" (SMT), or hyperthreading in Intel's marketing speech. It allows usually two, on some CPUs four threads to be scheduled by the CPU itself in hardware.

This is different from the fact that the operating system may on top of that schedule a larger number of threads in software.