6
votes

I got a question related to the new compute shaders. I am currently working on a particle system. I store all my particles in shader-storage-buffer to access them in the compute shader. Then I dispatch an one dimensional work group.

#define WORK_GROUP_SIZE 128
_shaderManager->useProgram("computeProg");
glDispatchCompute((_numParticles/WORK_GROUP_SIZE), 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);

My compute shader:

#version 430
struct particle{
         vec4 currentPos;
         vec4 oldPos;
};

layout(std430, binding=0) buffer particles{
         struct particle p[];
};

layout (local_size_x = 128, local_size_y = 1, local_size_z = 1) in;
void main(){
         uint gid = gl_GlobalInvocationID.x;

         p[gid].currentPos.x += 100;
}

But somehow not all particles are affected. I am doing it the same way it was done in this example, but it doesn't work. http://education.siggraph.org/media/conference/S2012_Materials/ComputeShader_6pp.pdf

Edit:

After I called glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT) I go on like this:

_shaderManager->useProgram("shaderProg"); 
glBindBuffer(GL_ARRAY_BUFFER, shaderStorageBufferID); 
glVertexPointer(4,GL_FLOAT,sizeof(glm::vec4), (void*)0);
glEnableClientState(GL_VERTEX_ARRAY); 
glDrawArrays(GL_POINTS, 0, _numParticles); 
glDisableClientState(GL_VERTEX_ARRAY);

So which bit would be appropriate to use in this case?

2

2 Answers

8
votes

You have your barriers on backwards. It's a common problem.

The bits you give to the barrier describe how you intend to use the data written, not how the data was written. GL_SHADER_STORAGE_BARRIER_BIT would only be appropriate if you had some process that wrote to a buffer object via image load/store (or a storage buffer/atomic counters), then used a storage buffer to read that buffer object data.

Since you're reading the buffer as a vertex attribute array buffer, you should use the cleverly titled, GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT.

1
votes

I resolved the problem. The problem was just the number of work-groups I dispatched. numParticles/WORK_GROUP_SIZE will be round off because both variables are integers. That caused too little dispatched work-groups with different numbers of particles.

When I got 1000 particles, then only 1000/128 = 7 work-groups are dispatched. Every work-group has the size of 128. That means I get 7*128 = 896 threads and thus 104 particles won't move at all. Since numParticles%128 may range from 0...128 I just dispatched one more work-group:

glDispatchCompute((_numParticles/WORK_GROUP_SIZE)+1, 1, 1);

And every particle moves from now on. :)