OpenGL state redundancy elimination Tree, render state priorities

Question

I am working on a Automatic OpenGL batching method in my Game Engine, to reduce draw calls and redundant calls.

My batch tree design begins with the most expensive states and adds leafs down for each less expensive state.

Example: Tree Root: Shaders / Programs Siblings: Blend states ... a.s.o.

So my question is what are most likely the most expensive calls, in this list:

binding program
binding textures
binding buffers
buffering texture, vertex data
binding render targets
glEnable / glDisable
blend state equation, color, functions, colorWriteMask
depth stencil state depthFunc, stencilOperations, stencilFunction, writeMasks

Also wondering which method will be faster:
- Collect all batchable draw commands to single vertex buffer and call only 1 draw call (this method would also force to update matrix transforms per vertex on cpu side)
- Do not batch at all and render many small draw calls, only batch particle system ...

PS: Render Targets will always Pre or Post changed, depending on usage.

Progress so far:

Andon M. Coleman: Cheapest Uniform & Vertex Array Binding, Expensive FBO, Texture Bindings
datenwolf: Programs invalidate State Cache

1: Framebuffer states
2: Program
3: Texture Binding
...
N: Vertex Array binding, Uniform binding

Current execution Tree in WebGL:

Program
Attribute Pointers
Texture
Blend State
Depth State
Stencil Front / Back State
Rasterizer State
Sampler State
Bind Buffer
Draw Arrays

Each step is a sibling hash tree, to avoid checking agains state cache inside of main render queue

Loading Textures / Programs / Shaders / Buffers happens before rendering in an extra queue, for future multi threading and also to be sure that the context is initialized before doing anything with it.

The biggest problem of self rendering objects is that you cannot control when something happens, for example if a developer calls these methods before gl is initialized, he wouldn't know why but he would have some bugs or problems...

What is your target GL version? Your list is missing render target (FBO) state changes, which are very expensive (but not necessarily an issue depending on version). — Andon M. Coleman
Thanks for the tip, i wanted to introduce FBO control soon after getting the base architecture work. — Zeto
I'm interested in any version, but most of OpenGLES 2, 3 btw. just include the version where you know something about :) — Zeto
I can tell you that vertex array and uniform states are the cheapest states you can change on all GL implementations and FBO states and texture binding tend to be the most expensive. I could not really give you a bullet list like in your question though, just the extreme ends of this list. I would also point out that a lot of the expense is not the initial call to glBindXXX (...) for instance, but what happens when you make a draw call after 20 odd states are changed and they all have to be validated at once. — Andon M. Coleman
Definitely programs, because a program change always comes with code cache invalidation. So after changing the program the GPU has to start with a cold execution cache. Sampler states behave much more like uniforms, but are not quite as cheap. It should also be pointed out, that the ordering of the expenses depends on the driver and the GPU being used, but also the program the GPU currently runs. As a rule of thumb, anything that makes the cache cold is a major performance killer. It's hard to overemphase, how important cache coherence and access patterns are for GPU performance. — datenwolf

derhass derhass · Accepted Answer · 2014-08-26T16:36:08

The relative costs of such operations will of course depend on the usage pattern and your general scenario. But you might find Nvidia's "Beoynd Porting" presentation slides as a useful guide. Let me reproduce especially slide 48 here:

Relative Cost of state changes

In decreasing cost...

Render Target ~60K/s

Program ~300K/s

ROP

Texture Bindings ~1.5M/s

Vertex Format

UBO Bindings

Uniform Updates ~10M/s

This does not directly match all of the bullet points of your list. E.g. glEnable/glDisable might affect anything. Also GL's buffer bindings are nothing the GPU directly sees. Buffer bindings are mainly a client side state, depending on the target, of course. Change of blending state would be a ROP state change, and so on.

OpenGL state redundancy elimination Tree, render state priorities

2 Answers