double buffering with FBO+RBO and glFinish()

Question

I am using an FBO+RBO, and instead of regular double buffering on the default framebuffer, I am drawing to the RBO and then blit directly on the GL_FRONT buffer of the default FBO (0) in a single buffered OpenGL context.

It is fine and I dont get any flickering, but if the scene gets a bit complex, I experience a HUGE drop in fps, something so weird that I knew something had to be wrong. And I dont mean from 1/60 to 1/30 because of a skipped sync, I mean a sudden 90% fps drop.

I tried a glFlush() after the blit - no difference, then I tried a glFinish() after the blit, and I had a 10x fps boost.

So I used regular doble buffering on the default framebuffer and swapbuffers(), and the fps got a boost as well, as when using glFinish().

I cannot figure out what is happening. Why glFinish() makes so much of a difference when it should not? and, is it ok to use a RBO and blit directly on the front buffer, instead of using a swapbuffers call in a double buffering context? I know Im missing vsync but the composite manager will sync anyway (infact im not seeing any tearing), it is just as if the monitor is missing 9 out of 10 frames.

And just out of curiosity, does a native swapbuffers() use glFinish() on either windows or linux?

"then I tried a glFinish() after the blit, and I had a 10x fps boost" - Rather sounds like a problem with your timing method, something like it isn't synced well with your GPU (which glFinish of course achieves). Some more code would be interesting. — Christian Rau
I don't see how reimplementing Double Buffering would be better anyway, considering all the stuff in drivers like Triple Buffering, Adaptive VSync and so on. — Bartek Banachewicz
@BartekBanachewicz It makes sense if you need the buffer both for on-screen and off-screen rendering, but you'll indeed have quite some work to (re)optimize the on-screen part. — KillianDS
Do you actually SEE a difference in the framerate? You should notice a 90% drop in a more complex scene, especially with movement. Also: FPS is a confusing unit for performance measurements, time-to-render is usually preferred (e.g. X ms per frame). — KillianDS

user815129 user815129 · Accepted Answer · 2013-07-12T08:19:34

I believe it is a sync-related issue.

When rendering directly to the RBO and blitting to the front buffer, there is simply no sync whatsoever. Thus on complex scenes the GPU command queue will fill quite quickly, then the CPU driver queue will fill quickly as well, until a CPU sync will be forced by the driver during an OpenGL command. At that point the CPU thread will be halted.

What I mean is that, without any form of sync, complex renderings (renderings for which one or more OpenGL command will be put in a queue) will always cause the CPU thread to be halted at some point, since as the queues will fill, the CPU will be issuing more and more commands.

In order to get a smooth (more constant) user interaction, a sync is needed (either with a platform-specific swapbuffers() or a glFinish()) so to stop the CPU from making things worse issuing more and more commands (which in turn would bring the CPU thread to a stop later)

reference: OpenGL Synchronization

double buffering with FBO+RBO and glFinish()

2 Answers