2
votes

I just watching my animated sprite code, and get some idea.
Animation was made by altering tex coords. It have buffer object, which holds current frame texture coords, as new frame requested, new texture coords feed up in buffer by glBufferData().

And what if we pre-calculate all animation frames texture coords, put them in BO and create Index Buffer Object with just a number of frame, which we need to draw

 GLbyte cur_frames = 0; //1,2,3 etc

Now then as we need to update animation, all we need is update 1 byte (instead of 4 /quad vertex count/ * 2 /s, t/ * sizeof(GLfloat) bytes for quad drawing with TRIANGLE_STRIP) frame of our IBO with glBufferData, we don't need hold any texture coords after init of our BO.

I am missing something? What are contras?

Edit: of course your vertex data may be not gl_float just for example.

2
pre-compute = more memory usage and larger Buffers. It's not good or bad, it's just different. Profile your app to see if it's worth your time to invest in higher performance here. Only you know if the speed/memory tradeoff is right for you. - Tim
Why would you even need to update anything with this technique? You just give a different start index to glDrawElements or use the BaseVertex variant. - KillianDS
Yeah you right, just forgot about it. - Aristarhys

2 Answers

1
votes

As Tim correctly states, this depends on your application, let us talk some numbers, you mention both IBO's and inserting texture coordinates for all frames into one VBO, so let us take a look at the impact of each.

Suppose a typical vertex looks like this:

struct vertex
{
    float x,y,z; //position
    float tx,ty; //Texture coordinates
}

I added a z-component but the calculations are similar if you don't use it, or if you have more attributes. So it is clear this attribute takes 20 bytes.

Let's assume a simple sprite: a quad, consisting of 2 triangles. In a very naive mode you just send 2x3 vertices and send 6*20=120 bytes to the GPU.

Triangulated quad

In comes indexing, you know you have actually only four vertices: 1,2,3,4 and two triangles 1,2,3 and 2,3,4. So we send two buffers to the GPU: one containing 4 vertices (4*20=80 byte) and one containing the list of indices for the triangles ([1,2,3,2,3,4]), let's say we can do this in 2 byte (65535 indices should be enough), so this comes down to 6*2=12 byte. In total 92 byte, we saved 28 byte or about 23%. Also, when rendering the GPU is likely to only process each vertex once in the vertex shader, it saves us some processing power also.

So, now you want to add all texture coordinates for all animations at once. First thing you have to note is that a vertex in indexed rendering is defined by all it's attributes, you can't split it in an index for positions and an index for texture coordinates. So if you want to add extra texture coordinates, you will have to repeat the positions. So each 'frame' that you add will add 80 byte to the VBO and 12 byte to the IBO. Suppose you have 64 frames, you end up with 64*(80+12)=5888byte. Let's say you have 1000 sprites, then this would become about 6MB. That does not seem too bad, but note that it scales quite rapidly, each frame adds to the size, but also each attribute (because they have to be repeated).

So, what does it gain you?

  1. You don't have to send data to the GPU dynamically. Note that updating the whole VBO would require sending 80 bytes or 640 bits. Suppose you need to do this for 1000 sprites per frame at 30 frames per second, you get to 19200000bps or 19.2Mbps (no overhead included). This is quite low (e.g. 16xPCI-e can handle 32Gbps), but it could be worth wile if you have other bandwidth issues (e.g. due to texturing). Also, if you construct your VBO's carefully (e.g. separate VBO's or non-interleaved), you could reduce it to only updating the texture-part, which is only 16 byte per sprite in the above example, this could reduce bandwidth even more.
  2. You don't have to waste time computing the next frame position. However, this is usually just a few additions and few if's to handle the edges of your textures. I doubt you will gain much CPU power here.

Finally, you also have the possibility to simply split the animation image over a lot of textures. I have absolutely no idea how this scales, but in this case you don't even have to work with more complex vertex attributes, you just activate another texture for each frame of animation.

edit: another method could be to pass the frame number in a uniform and do the calculations in your fragment shader, before sampling. Setting a single integer uniform should be that much of an overhead.

1
votes

For a modern GPU, accessing/unpacking single bytes is not necessarily faster than accessing integer types or even vectors (register sizes & load instructions, etc.). You can just save memory and therefore memory bandwidth, but I wouldn't expect this to give much of a difference in relation to all other vertex attribute array accesses.

I think, the fastest way to supply a frame index for animated sprites is either an uniform, or if multiple sprites have to be rendered with one draw call, the usage of instanced vertex attrib arrays. With the latter, you could provide a single index for fixed-size subsequences of vertices. For example, when drawing 'sprite-quads', you'd have one frame index fetch per 4 vertices. A third approach would be a buffer-texture, when using instanced rendering.

I recommend a global (shared) uniform for time/frame index calculation, so you can calculate the animation index on the fly within you shader, which doesn't require you to update the index buffer (which then just represents the relative animation state among sprites)