2
votes

I am working on a compute shader where the output is written to SSBO.Now,the consumer of this buffer is CUDA which expects it to contain unsigned bytes.I currently can't see find the way how to write a byte per index in SSBO.With texture or image the normalized float to unsigned byte conversion is handled by OpenGL.For example I can attach a texture with internal format R8 and store byte per entry.But nothing like this is possible with SSBO.Does it mean that except of bool data type all the numerical storage types in SSBO can be at least 4 bytes per entry only?

Practically speaking I would like to be able to do the following:

Compute shader:

  #version 430 core
  layout (local_size_x = 8,local_size_y = 8 ) in;
  struct SSBOBlock 
  {
    byte mydata;
  };


  layout(std430,binding = BUFFER_OUTPUT) writeonly buffer bBuffer
  {

     SSBOBlock Ouput[];

  } Out;


   void main()
  {
        //..... Compute shader stuff...
        //.......
       Out.Ouput[globalIndex].mydata = val;//where val is normalized float
   }
2
If you have been under the impression that bool in GLSL is 1-byte, you may need to re-write some of your shaders :)Andon M. Coleman
#version 423 core...watgenpfault
@AndonM.Coleman I am alway glad to reveal new gotchas regarding OpenGL :)Michael IV

2 Answers

5
votes

I found a way to write unsigned byte data into buffer in compute shader.Buffer texture does the job.It is basically image texture with buffer as storage.This way I can specify image format to be R8 which allows me to store byte size values on each index of the buffer.

GLuint _tbo_buffer,_tbo_tex;
glGenBuffers(1, &_tbo_buffer);
glBindBuffer(GL_TEXTURE_BUFFER, _tbo_buffer);
glBufferData(GL_TEXTURE_BUFFER, SCREEN_WIDTH * SCREEN_HEIGHT, NULL, GL_DYNAMIC_COPY);
glGenTextures(1, &_tbo_tex);
glBindTexture(GL_TEXTURE_BUFFER, _tbo_tex);
//attach the TBO to the texture:
glTexBuffer(GL_TEXTURE_BUFFER, GL_R8, _tbo_buffer);
glBindImageTexture(0, _tbo_tex, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_R8);

Compute shader:

#version 430 core

layout (local_size_x = 8,local_size_y = 8 ) in;
layout(binding=0) uniform sampler2D   TEX_IN;
layout(r8) writeonly uniform imageBuffer mybuffer;

void main(){
   vec2 texSize =  vec2(textureSize(TEX_IN,0));
   vec2 uv      =  vec2(gl_GlobalInvocationID.xy / texSize);
   vec4 tex     =  texture(TEX_IN,uv); 
   uint globalIndex =  gl_GlobalInvocationID.y * nThreads.x +    gl_GlobalInvocationID.x;
  //store only r:
   imageStore(mybuffer,int(globalIndex),vec4(0.5,0,0,0));

}

Then we can read byte by byte on CPU or map to CUDA buffer resource:

 GLubyte* ptr = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_READ_ONLY);
4
votes

The smallest type exposed on GPUs tends to be 32-bit for scalars. Even the boolean type you mentioned is actually 32-bit. The same goes for languages like C often; a boolean does not need anything more than 1-bit but even so bool is not synonymous with "give me the smallest data type available."

There are intrinsic functions you can use to pack and unpack data types however and I will show an example of how to use them below:

#version 420 core
layout (local_size_x = 8,local_size_y = 8 ) in;
struct SSBOBlock 
{
  uint mydata;
};


layout(std430,binding = BUFFER_OUTPUT) writeonly buffer bBuffer
{

  SSBOBlock Ouput[];

} Out;


void main()
{
  //..... Compute shader stuff...
  //.......
  Out.Output [globalIndex].mydata = packUnorm4x8 (val)
  // where val is a 4-component unsigned normalized vector to pack into globalIndex
}

Your sample shader shows an attempt to write a single scalar to a "byte" data type, that is not possible and you are going to have to modify this somehow to work with indices that reference a packed group of 4 scalars. In the worst-case, this might mean unpacking three values and then re-packing the entire thing just to write one scalar.

This intrinsic function is discussed in the extension specification for GL_ARB_shading_languge_packing and is core in GL 4.2 and later.


Even if you were on an implementation that does not support that extension, it is explained in the text of the extension specification exactly what each does. The equivalent operation for packUnorm4x8 is:

uint fixed_val = round(clamp(float_val, 0, +1) * 255.0);

Some bit-shifts will be necessary to properly pack each component, but those are trivial.