In my current understanding, the hardware hierarchy of CUDA model is GPU card -> Streaming Multiprocessors (SMs) -> cores, and the program hierarchy is kernel-> grid -> block -> warp -> single thread. I want to know the correspondence between the hardware and program hierarchy. That is, Is a kernel in general composed of several grids? is grid contained in the GPU card or in the SMs? if grid is contained in the GPU card, can the the GPU card contain only one grid or multiple grids? Does block correspond to a SMS? Can a SMs contains only one block or multiple blocks? Can a block span several SMs? Can a core execute only one thread or multiple threads? etc.
1 Answers
A kernel is a function that runs on the GPU.
The grid is all threadblocks associated with a kernel launch. A kernel launch creates a single grid. A grid can run on the entire GPU devices (all SM's in the GPU). A grid is composed of threadblocks.
Threadblocks are groups of threads. Threads are grouped into warps (32 threads) for execution purposes, so we can also say threadblocks are groups of warps.
Threadblocks (the warps they contain) execute on an SM. Once a threadblock begins executing on a particular SM, it stays on that SM and will not migrate to another SM.
SMs are composed of cores. Each core executes one thread. The core execution engine may have the ability to handle multiple instructions at a time, so it can actually handle more than one thread, but not from the same warp. This part gets complicated and it's not essentialy to good beginner understanding of how a GPU works, so it's convenient and useful to think of a core only handling one thread at any given instant (instruction cycle).
An SM can handle multiple blocks simultaneously.
Please don't post questions that contain many questions. Questions on SO should show some research effort.
Good research effort for questions like these would be take take some basic webinars from the nvidia webinar page, which will only require a couple hours of study.
Try these two first:
GPU Computing using CUDA C – An Introduction (2010) An introduction to the basics of GPU computing using CUDA C. Concepts will be illustrated with walkthroughs of code samples. No prior GPU Computing experience required
GPU Computing using CUDA C – Advanced 1 (2010) First level optimization techniques such as global memory optimization, and processor utilization. Concepts will be illustrated using real code examples