I am looking for a way to partition my Nvidia GPU device, so that I can run two sets of kernels concurrently without them fighting for SMs.
According to documentation, in openCL you can use clCreateSubDevices. Is there any CUDA equivalent?
I am looking for a way to partition my Nvidia GPU device, so that I can run two sets of kernels concurrently without them fighting for SMs.
According to documentation, in openCL you can use clCreateSubDevices. Is there any CUDA equivalent?
I personally haven't come across such a feature in CUDA.
To run two kernels concurrently, you can calculate the occupancy of your kernels, accordingly call limited number of blocks, and use a loop inside kernels to imitate the existence of more blocks. It would probably cost you a few registers more per thread. If you don't want to touch the content of your kernels, you can launch each kernel inside a stream multiple times, each time with limited grid size. The cost of the second approach is probably not-fully-occupied SMs when transitioning between kernels of one stream.