Device fission with CUDA

Question

I am looking for a way to partition my Nvidia GPU device, so that I can run two sets of kernels concurrently without them fighting for SMs.

According to documentation, in openCL you can use clCreateSubDevices. Is there any CUDA equivalent?

TTBOMK CUDA does not support device fission à la OpenCL. You can however run multiple kernels in parallel, and the scheduler may run them simultaneously, depending on availability of resources / scheduler mood. But it is not guaranteed. — user703016

Farzad Farzad · Accepted Answer · 2015-02-11T19:27:54

I personally haven't come across such a feature in CUDA.

To run two kernels concurrently, you can calculate the occupancy of your kernels, accordingly call limited number of blocks, and use a loop inside kernels to imitate the existence of more blocks. It would probably cost you a few registers more per thread. If you don't want to touch the content of your kernels, you can launch each kernel inside a stream multiple times, each time with limited grid size. The cost of the second approach is probably not-fully-occupied SMs when transitioning between kernels of one stream.

Device fission with CUDA

1 Answers