2
votes

I have a compute shader that is dispatched iteratively and uses a 2d texture to temporarily store values. Each invocation id accesses a particular row in the texture.

The problem is, this texture must be initialized to 0's before each shader dispatch.

Currently I use a loop at the end of the shader code that uses imageStore() to reset all pixels in the respective row back to 0.

for (uint i = 0; i < CONSTANT_SIZE; i++)
{  
     imageStore( myTexture, ivec2( i, global_invocation_id ), vec4( 0, 0, 0, 0) );          
} 

I was wondering if there is a faster way of doing this, a way to set more than one pixel with a single call (preferably an entire row)? I've looked at the GLSL 4.3 specification on image operations but I can't find one that doesn't require a specific pixel location.

If there is a faster way to achieve this on the CPU I would be open to that as well, i've tried rebuffering the texture using glTexImage2D(), but there is not really any noticeable performance changes to using imageStore for each individual pixel.

1
Do you actually mean "shader invocation"?thokra
I'm probably using the term incorrectly. "at the end of each shader invocation" should be changed to "at the end of the shader code". I made an edit.kbirk
Can you post your shader code?thokra

1 Answers

3
votes

The "faster way" would be to clear the texture from OpenGL, rather than in your shader. 4.4 provides a direct texture clearing function, but even something as simple as a pixel transfer via glTexSubImage2D (after a barrier of course) would probably be faster than what you're doing.

Alternatively, if all you're using this texture for is scratch memory for invocations... why are you using a texture? It'd be better to use shared variables for that. Just create an array of arrays of vec4s, where each local invocation accesses one array of the arrays. Access to those are going to be loads faster.

Given 32KB of storage for shared variables (the bare minimum allowed), if you have 8 invocations per work group, that gives each one 4KB to work with. That gives each one 256 vec4s to play with. If you move up to 16 invocations, you reduce this to 128 vec4s.