I have several compute shaders (let's call them compute1
, compute2
and so on) that have several input bindings (defined in shader code as layout (...) readonly buffer
) and several output bindings (defined as layout (...) writeonly buffer
). I'm binding buffers with data to their descriptor sets and then trying to execute these shaders in parallel.
What I've tried:
vkQueueSubmit()
withVkSubmitInfo.pCommandBuffers
holding several primary command buffers (one per compute shader);vkQueueSubmit()
withVkSubmitInfo.pCommandBuffers
holding one primary command buffer that was recorded usingvkCmdExecuteCommands()
withpCommandBuffers
holding several secondary command buffers (one per compute shader);- Separate
vkQueueSubmit()
+vkQueueWaitIdle()
from differentstd::thread
objects (one per compute shader) - each command buffer is allocated in separateVkCommandPool
and is submitting to ownVkQueue
with ownVkFence
, main thread is waiting usingthreads[0].join(); threads[1].join();
and so on; - Separate
vkQueueSubmit()
from different detachedstd::thread
objects (one per compute shader) - each command buffer is allocated in separateVkCommandPool
and is submitting to ownVkQueue
with ownVkFence
, main thread is waiting usingvkWaitForFences()
withpFences
holding fences that where used invkQueueSubmit()
and withwaitAll
holdingtrue
.
What I've got:
In all cases result time is almost the same (difference is less then 1%) as if calling vkQueueSubmit()
+vkQueueWaitIdle()
for compute1
, then for compute2
and so on.
I want to bind the same buffers as inputs for several shaders, but according to time the result is the same if each shader is executed with own VkBuffer
+VkDeviceMemory
objects.
So my question is:
Is is possible to somehow execute several compute shaders simultaneously, or command buffer parallelism works for graphical shaders only?
Update: Test application was compiled using LunarG Vulkan SDK 1.1.73.0 and running on Windows 10 with NVIDIA GeForce GTX 960.