How to batch sprites in iOS/OpenGL ES 2.0

Question

I have developed my own sprite library on top of OpenGL ES 2.0. Right now, I am not doing any batching of draw calls; instead, each sprite has its own VBO/VAO of four textured vertices, drawn as a triangle strip (The VAO/VBO itself is managed by the Texture atlas, so identical sprites reuse the same VAO/VBO, which is 'reference counted' and hence deleted when no sprite instances reference it).

Before drawing each sprite, I'll bind its texture, upload its uniforms/attributes to the shader (modelview matrix, opacity - Projection matrix stays constant all along), bind its Vertex Array Object (4 textured vertices + four indices), and call glDrawElements(). I do cull off-screen sprites (based on position and bounds), but still it is one draw call per sprite, even if all sprites share the same texture. The vertex positions and texture coordinates for each sprite never change.

I must say that, despite this inefficiency, I have never experienced performance issues, even when drawing many sprites on screen. I do split the sprites into opaque/non-opaque, draw the opaque ones first, and the non-opaque ones after, back to front. I have seen performance suffer only when I overdraw (tax the fill rate).

Nevertheless, the OpenGL instruments in Xcode will complain that I draw too many small meshes and that I should consolidate my geometry into less objects. And in the Unity world everyone talks about limiting the number of draw calls as if they were the plague.

So, how should I go around batching very many sprites, each with a different transform and opacity value (but the same texture), into one draw call? One thing that comes to mind is to modify the vertex data every frame, and stream it: applying the modelview matrix of each sprite to all its vertices, assembling the transformed vertices for all sprites into one mesh, and submitting it to the GPU. This approach does not solve the problem of varying opacity between sprites.

Another idea that comes to mind is to have all the textured vertices of all the sprites assembled into a single mesh (VBO), treated as 'static' (same vertex format I am using now), and a separate array with the stuff that changes per sprite every frame (transform matrix and opacity), and only stream that data each frame, and pull it/apply it on the vertex shader side. That is, have a separate array where the 'attribute' being represented is the modelview matrix/alpha for the corresponding vertices. Still have to figure out the exact implementation in terms of data format/strides etc. In any case, there is the additional complication that arises whenever a new sprite is created/destroyed, the whole mesh has to be modified...

Or perhaps there is an ideal, 'textbook' solution to this problem out there that I haven't figured out? What does cocos2d do?

Maurizio Benedetti Maurizio Benedetti · Accepted Answer · 2014-04-04T12:29:02

When I initially started reading you post I though that each quad used a different texture (since you stated "Before drawing each sprite, I'll bind its texture") but then you said that each sprite has "the same texture".

A possible easy win is to control the way you bind your textures during the draw since each call is a burden for the OpenGL driver. If (and I am not really sure abut this from your post) you use different textures, I suggest to go for a simple texture atlas where all the sprites are inside a single picture (preferably a power of 2 texture with mipmapping) and then you take the piece of the texture you need in the fragment using texture coordinates (this is the reason they exist in the end)

If the position of the sprites change over time (of course it does) at each frame, a possible advantage would be to pack the new vertex coordinates of your sprites at each frame and draw directly from memory (possibly over VAO. VBO could cost more since you need to build it each frame? to be tested in real scenario). This would be a good call pack operation and I am pretty sure it will bust the performances.

Consider that the VAO option could be feasible since we are talking about very small amount of data and the memory bandwidth should not represent a real bottleneck (each quad I guess uses 12 floats for vertex coordinates, 8 for textures and 12 for normals, 128 byte?), it shouldn't be a big problem over VAO.

About opacity, can't you play using an uniform to your fragment shader where you play with alpha? Am I wrong with it? It should work.

I hope this helps.

Ciao, Maurizio

How to batch sprites in iOS/OpenGL ES 2.0

1 Answers