The CUDA programing guide states:
The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor, and multiple thread blocks can execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors.
Does it mean that if I have a video card of 2 multiprocessor x n-cuda cores and if a launch a kernel like
MyKernel<<<1,N>>>(sth);
One of the multiprocessors will be idle, since I'm launching a single block of N threads?