1
votes

I'm debating the pros and cons of passing texture coordinates to a GLSL shader in various ways.

I'm rendering a lot of instance data. I have one basic model, and then I pass a Transformation Matrix and a Texture/Sprite Index to my shader. Each model is then rotated and translated as per the transformation matrix, and the texture is decided as per this snippet:

TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y+(TexIndex/16))/16;

The thing I don't like about this is that I've hard-coded the sprite and texture size. I could use uniforms to pass this information along, but then I still have the limitation that my sprite can't vary from instance to instance (not that I have a planned use case for this). Moreover, it's a bit more computation on the GPU to determine the coordinates of the sprite.

Another method I could use would be to specify an entire Rect which would delimit the position, width and height of the sprite within the texture map. However, this would require specifying 4 floats (16 bytes) of information, rather than a single texture index byte. Multiply that by, say, 200K instances and we're looking at about 3 MB of data (in addition to the other data). I don't know if that is considered "a lot" in today's day and age or not.

Should I be focusing on easing the computation in my GLSL shaders or minimizing the size of my buffers? I hear that transferring data to the GPU is often the bottleneck, but recopyng the data to the buffer will be very seldom compared to the number of vertices it has to render every frame.


Likewise, I'm considering taking out my model transform matrix and replacing it with a vec3 and vec2 for translation and rotation respectively (I only need 2 degrees of rotation) which would knock me down from 16 floats to 5, and then I can just rebuild the matrix in the vertex shader. Again, this takes away some flexibility, and I'm not sure of the cost savings.

1
There's really no way to answer this question without sitting down and profiling it on the hardware of interest. It's not a bad question; it's just that the question will vary from implementation to implementation, from hardware to hardware. On high-end machines, one method might be faster, but the other might be faster on low-end machines. On AMD's Fusion CPU/GPUs, the buffer bandwidth may be far worse than with a discrete GPU. Alternatively, the DMA overhead for discrete GPUs may make Fusion chips ideal for this. There's no way to know for certain without profiling it.Nicol Bolas

1 Answers

1
votes

I tried doing it the other way, specifying a texture rect rather than a byte index, and it actually yielded a huge speed increase (520 FPS to 3600 FPS, or 1.92ms/frame to 0.27 ms/frame).

It seems that reducing computation is more important, at least on my GPU (Radeon HD 5700 series). Or perhaps it's just modulus that's expensive, not sure. I'm quite pleased with the results though; I get more flexibility at a cheaper cost!