I want to run heterogeneous kernels that execute on a single GPU asynchronously. I think this is possible in Nvidia Kepler K20(Or any device having compute capability 3.5+) by launching each of this kernels to a different stream and the runtime system maps them to different hardware queues based on the resource availability. Is this feature accessible in OpenCL? If it is so, what is the equivalent of a CUDA 'Stream' in OpenCL? Do Nvidia drivers support such an execution on their K20 cards through OpenCL? Is their any AMD GPU that has similar feature(or is there anything on development)? Answer for any of these questions will help me a lot.
2
votes
Have you tried OpenCL command queues on NVIDIA GPUs to achieve concurrent execution?
– usman
hmm i'm curious too. another thing you can try is using out-of-order queues (set CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE clCreateCommandQueue). let us know the results. thanks.
– isti_spl
I have used OpenCL's command queue to achieve concurrency. I ended up writing two separate projects one for OpenCL and another for CUDA. CUDA based kernels always performed better than OpenCL (obviously). Did not get a chance to try my OpenCL program on an AMD GPU at that time. Will try it in the future!
– Vishnu
1 Answers
1
votes
In principle, you can use OpenCL command queues to achieve CKE (Concurrent Kernel Execution). You can launch them from different CPU threads. Here are few links that might help you get started: How do I know if the kernels are executing concurrently? http://devgurus.amd.com/thread/142485
I am not sure how would it work with NVIDIA Kepler GPUs as we are having strange issues using OpenCL on K20 GPU.