0
votes

I have two compute shaders and the first one modifies DispatchIndirectCommand buffer, which is later used by the second one.

// This is the code of the first shader

struct DispatchIndirectCommand{
    uint    x;
    uint    y;
    uint    z;
};
restrict layout(std430, set = 0, binding = 5) buffer Indirect{
    DispatchIndirectCommand[1] dispatch_indirect;
};
void main(){
   // some stuff
   dispatch_indirect[0].x = number_of_groups; // I used debugPrintfEXT to make sure that this number is correct
}

I execute them as follows

vkCmdBindPipeline(cmd_buffer, VK_PIPELINE_BIND_POINT_COMPUTE, first_shader);
vkCmdDispatch(cmd_buffer, x, 1, 1);
vkCmdBindPipeline(cmd_buffer, VK_PIPELINE_BIND_POINT_COMPUTE, second_shader);
vkCmdDispatchIndirect(cmd_buffer, indirect_buffer, 0);

The problem is that the changes made by first shader are not reflected by the second one.

// This is the code of the second shader
void main(){
    debugPrintfEXT("%d", gl_GlobalInvocationID.x); //this seems to never be called
}

I initialise the indirect_buffer with VkDispatchIndirectCommand{.x=0,.y=1,.z=1}, and it seems that the second shader always executes with x==0, because the debugPrintfEXT never prints anything. I tried to add a memory barrier like

VkBufferMemoryBarrier barrier;
barrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;
barrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
barrier.srcQueueFamilyIndex = queue_idx;
barrier.dstQueueFamilyIndex = queue_idx;
barrier.buffer = indirect_buffer;
barrier.offset = 0;
barrier.size = sizeof_indirect_buffer;

However, this does not seem to make any difference. What does seem to work, is when I use

barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT | VK_ACCESS_INDIRECT_COMMAND_READ;

When I use such access flags, the all compute shaders work properly. However, I get a validation error

Validation Error: [ VUID-vkCmdPipelineBarrier-dstAccessMask-02816 ] Object 0: handle = 0x5561b60356c8, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x69c8467d | vkCmdPipelineBarrier(): .pMemoryBarriers[1].dstAccessMask bit VK_ACCESS_INDIRECT_COMMAND_READ_BIT is not supported by stage mask (VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT). The Vulkan spec states: The dstAccessMask member of each element of pMemoryBarriers must only include access flags that are supported by one or more of the pipeline stages in dstStageMask, as specified in the table of supported access types (https://vulkan.lunarg.com/doc/view/1.2.182.0/linux/1.2-extensions/vkspec.html#VUID-vkCmdPipelineBarrier-dstAccessMask-02816)

Vulkan's documentation states that

VK_ACCESS_INDIRECT_COMMAND_READ_BIT specifies read access to indirect command data read as part of an indirect build, trace, drawing or dispatching command. Such access occurs in the VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT pipeline stage

It looks really confusing. It clearly does mention "dispatching command", but at the same time it says that the stage must be VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT and not VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT. Is the specification contradictory/imprecise or am I missing something?

1

1 Answers

2
votes

You seem to be employing trial-and-error strategy. Do not use such strategy in low level APIs, and preferrably in computer engineering in general. Sooner or later you will encounter something that will look like it works when you test it, but be invalid code anyway. The spec did tell you the exact thing to do, so you never ever had a legitimate reason to try any other flags or with no barriers at all.

As you discovered, the appropriate access flag for indirect read is VK_ACCESS_INDIRECT_COMMAND_READ_BIT. And as the spec also says, the appropriate stage is VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT.

So the barrier for your case should probably be:

srcStageMask = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT
srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
dstStageMask = VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;

The name of the stage flag is slightly confusing, but the spec is otherwisely very clear it aplies to compute:

VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT specifies the stage of the pipeline where VkDrawIndirect* / VkDispatchIndirect* / VkTraceRaysIndirect* data structures are consumed.

and also:

For the compute pipeline, the following stages occur in this order:

  • VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
  • VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT

PS: Also relevant GitHub Issue: KhronosGroup/Vulkan-Docs#176