I know that NVIDIA gpus with compute capability 2.x or greater can execute u pto 16 kernels concurrently. However, my application spawns 7 "processes" and each of these 7 processes launch CUDA kernels.
My first question is that what would be the expected behavior of these kernels. Will they execute concurrently as well or, since they are launched by different processes, they would execute sequentially.
I am confused because the CUDA C programming guide says:
"A kernel from one CUDA context cannot execute concurrently with a kernel from another CUDA context." This brings me to my second question, what are CUDA "contexts"?
Thanks!