2
votes

Let me say my way of uniform buffering first, I have a buffer in device local memory and one (for staging) in host coherent memory, and each is divided to number-of-framebuffers sections, in each frame, before beginning render pass, I update the host located one and then copy that to device located and I wait till the command buffer ends.

(Assume that my GPU is a discrete one with no shared memory between CPU and GPU)

Now my questions:

  • Is it a best way for managing uniform buffer with staging and copying in each frame?
  • Indeed, I know the synchronization mechanism I use, is not OK, what is the best way for doing so?
  • If your answer is to do barrier synchronization, what is the exact way of doing that? (because I have not seen any sample like this.)

Till here every sample code I have seen, use host coherent uniform buffers, I will be appreciated if you refer a sample code for something like this.

1

1 Answers

2
votes

I'm surprised you have a device without a HOST_VISIBLE heap that uniform buffers can be located in. Using the CPU to write into the same buffer that the GPU reads from is often the best path, and I thought all modern GPUs supported that.

But if you really need a host->device copy, then you want to make sure to start the copies early enough that they're done by the time the graphics pipeline is ready to use them, and use a transfer-only queue to do the copy. This will overlap the copy with other earlier work, so the graphics pipeline is never sitting idle waiting for it. To do that:

  1. Write the uniforms for a frame into the host buffer.
  2. Submit a command buffer with the host->device copy commands to the transfer queue, and put a VkSemaphore in VkSubmitInfo::pSignalSemaphores.
  3. Finish any remaining work for the frame rendering command buffers, and submit them to the graphics queue with the earlier semaphore in the VkSubmitInfo::pWaitSemaphores list. Unfortunately, since some of the uniforms are probably needed in the vertex shader, *pWaitDstStageMask will need to be VK_PIPELINE_STAGE_VERTEX_SHADER_BIT.

If you're GPU-limited, hopefully the transfer for frame N+1 happens while the graphics pipeline is still working on frame N. You might need a tool like GPUView or Radeon Graphics Profiler to see if this is happening properly.