How to efficiently raise a VkDispatchIndirectCommand field to a multiple of subgroupSize

Question

I am playing with compute shaders in vulkan and reached a problem which i can not solve to my satisfaction. I have 2 compute shader. The first one calculates the amount of invocation needed (among other things) in the second one and writes these ( indirect through atomicAdd - every Invocation adds an unkown amount to the whole ) in an field of VkDispatchIndirectCommand. The Problem is VkDispatchIndirectCommand represents the amount of WorkGroups and not Invocations and the Invocation count per WorkGroup should be at least subgroupSize (ex. 32 at nvidia). My first try to correct the amount between both shader runs on the host side resulted in an imense perfomance drop. What would be a better aproach or is there even an ideal solution in vulkan, which I just do not know yet.

Since you provided the compute shaders and pipelines being dispatched, can you explain why it is that the shader building the indirect dispatch commands doesn't know how many invocations there are per work group? — Nicol Bolas
@NicolBolas I know the work group size but not the invocation or the work group count and the atomicAdd only accept intgers so i cannot add for example 1/32 — Daniel

Jesse Hall Jesse Hall · Accepted Answer · 2019-01-28T18:11:32

From the use of atomicAdd, it sounds like the number of invocations you want is calculated in a distributed way across all the invocations of the first dispatch. Assuming you can't change that, and really need a post-process to convert from number of invocations to number of workgroups, you can run a very small dispatch (one thread) after the first one which does that conversion before the indirect dispatch. This is essentially what you're doing on the CPU, but done on the GPU in a pipelined way that should have lower latency.

How to efficiently raise a VkDispatchIndirectCommand field to a multiple of subgroupSize

1 Answers