2
votes

This question is a continuation of this subject : How to bind thousands of buffers properly

This problem is related to the particle simulation subject. Let say I need a global structure that includes :

  1. A 3D matrix (32*32*32) of uints (holding header id of a hashed linked list).
  2. A counter that tells me the amount of particles in my hashed linked list.
  3. A hashed linked list of particles.

The first idea was to use a 3D texture for the first item, an atomic counter buffer for the second, and a SSB for the third.

Each entry in the SSB is a particle plus a uint which value points to the location of the next particle in the same voxel.

Nothing magical here.

Now, in order to be space independent (not bound to a unique cubic space) I must be able to pass particles from a cube to others surrounding it. Since I'm in a 3D space, 27 cubes (front) as input variables for physical computation, but also 27 cubes (back) as output since I may write a particle from a cube (front) to an other (back) covering a different part of the space.

This leads me to a requirement in bindings of 54 textures, 54 SSB and 54 atomic counters. While the two firsts may not be a problem (my hardware limit is around 90 for both), the ACB binding limit is 8.

Assuming having a single ACB containing the particle count of each cube is not easy to maintain (I didn't give a long time to the thought, it may be the solution, but this is not the question here).

CONTEXT :

A SSB can contain anything. So a solution would be to concatenate the three structures (header matrix, counter and linked list) inside one SSB that will be my cube super structure.

I need before each pass to know how many particles are inside the SSB to do a proper glDispatchCompute​() call.

QUESTION :

Would it be bad to bind the SSB just to read the uint that contains the amount of particles ?

If no, is one of the two methods to access the count better than the other ? Why ?

GLuint value;

//1st method
m_pFunctions->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, m_buffer);
m_pFunctions->glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, OFFSET_TO_VALUE, sizeof(GLuint), &value);
m_pFunctions->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);

//2nd method
m_pFunctions->glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 0, m_buffer, OFFSET_TO_VALUE, sizeof(GLuint));
m_pFunctions->glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, sizeof(GLuint), &value);
m_pFunctions->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);

If no, is there a good way or should I separate the counter from the SSB ?

1
It is theoretically possible, of course you will likely need the SSBs to be coherent and use an appropriate barrier. That said it really sounds like what you are actually trying to do could be done using DispatchIndirect.Andon M. Coleman
I've never worked with compute shaders yet so the following question may set you off. Even if I only used atomic operations for read-write on the SSBs, do I need to define the type qualifier coherent for the buffer variables I'm writing into. I also never used memory barriers. If I where to use my SSB as a vertex array for rendering, the barrier to use should be GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT ? (also new with memory barriers, just want to be sure I got the concept).agrum
Yes and no... you are thinking about the wrong kind of memory barrier to be honest. I meant you would need one in your shader to simulate an atomic counter using something else. Really though if your fundamental problem here is having to split this into multiple dispatches, the best way to do that is to have the compute shader itself setup a buffer with all the info necessary for the next dispatch. The process is known as Indirect Dispatch.Andon M. Coleman
So instead of having a uint in my SSB for the number of particles, I should have a uvec3 with the second and third components to 1. For each cube to compute, I bind the SSB corresponding to this cube that was produced in the last rendering to both targets GL_DISPATCH_INDIRECT_BUFFER​ and GL_SHADER_STORAGE_BUFFER and when calling glDispatchComputeIndirect​(), I provide the offset to where the uvec3 is in the SSB. I don't even have to read the buffer on the client side for the amount of particles, which is awesome. But did I get the idea right ? (Your help is much appreciated so far)agrum
There is that, but compute shaders actually run in parallel. If you are trying to increment a fake atomic counter by writing to an SSB, you need to make sure that no other compute shader invocation is allowed to read the SSB's value before another parallel invocation finishes writing it. Memory barriers (in the shader itself, not the glMemoryBarrier (...) command) and coherence are the tools necessary for that, you want the other invocations to wait their turn and you also want them to be aware of changes to the SSB's value by parallel invocations.Andon M. Coleman

1 Answers

0
votes

Even though binding the buffer and getting the counter inside is theoretically possible there is a simpler way to do it by pushing the concept of "everything-in-it-SSB" further.

By reserving the space for 3 consecutive uints in the SSB, we can use their value as dispatch parameters X, Y and Z. The X would still be the amount of particle, but Y and Z would simply be hard set 1s.

Then instead of glDispatchCompute(), call glDispatchComputeIndirect() after binding the proper SSB to the GL_DISPATCH_INDIRECT_BUFFER target.

However, by using part of SSBs as fake atomic counters, the buffers must be prefixed by th type qualifier coherent "to enforce coherency of memory accesses". memory barriers should also be set to achieve visibility between coherent variables.