Grid of thread blocks and Multiprocessor

Question

The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor, and multiple thread blocks can execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors.

Does it mean that if I have a video card of 2 multiprocessor x n-cuda cores and if a launch a kernel like

MyKernel<<<1,N>>>(sth);

One of the multiprocessors will be idle, since I'm launching a single block of N threads?

talonmies talonmies · Accepted Answer · 2014-05-04T07:13:35

You are correct.

In all currect CUDA architectures, a block is only ever scheduled and run on a single multiprocessor. If you run one block on a device with more than one multiprocessor, all but one of those multiprocessors will be idle.

Grid of thread blocks and Multiprocessor

1 Answers