7
votes

I am using an FBO+RBO, and instead of regular double buffering on the default framebuffer, I am drawing to the RBO and then blit directly on the GL_FRONT buffer of the default FBO (0) in a single buffered OpenGL context.

It is fine and I dont get any flickering, but if the scene gets a bit complex, I experience a HUGE drop in fps, something so weird that I knew something had to be wrong. And I dont mean from 1/60 to 1/30 because of a skipped sync, I mean a sudden 90% fps drop.

I tried a glFlush() after the blit - no difference, then I tried a glFinish() after the blit, and I had a 10x fps boost.

So I used regular doble buffering on the default framebuffer and swapbuffers(), and the fps got a boost as well, as when using glFinish().

I cannot figure out what is happening. Why glFinish() makes so much of a difference when it should not? and, is it ok to use a RBO and blit directly on the front buffer, instead of using a swapbuffers call in a double buffering context? I know Im missing vsync but the composite manager will sync anyway (infact im not seeing any tearing), it is just as if the monitor is missing 9 out of 10 frames.

And just out of curiosity, does a native swapbuffers() use glFinish() on either windows or linux?

2
"then I tried a glFinish() after the blit, and I had a 10x fps boost" - Rather sounds like a problem with your timing method, something like it isn't synced well with your GPU (which glFinish of course achieves). Some more code would be interesting.Christian Rau
I don't see how reimplementing Double Buffering would be better anyway, considering all the stuff in drivers like Triple Buffering, Adaptive VSync and so on.Bartek Banachewicz
@BartekBanachewicz It makes sense if you need the buffer both for on-screen and off-screen rendering, but you'll indeed have quite some work to (re)optimize the on-screen part.KillianDS
Do you actually SEE a difference in the framerate? You should notice a 90% drop in a more complex scene, especially with movement. Also: FPS is a confusing unit for performance measurements, time-to-render is usually preferred (e.g. X ms per frame).KillianDS

2 Answers

1
votes

I believe it is a sync-related issue.

When rendering directly to the RBO and blitting to the front buffer, there is simply no sync whatsoever. Thus on complex scenes the GPU command queue will fill quite quickly, then the CPU driver queue will fill quickly as well, until a CPU sync will be forced by the driver during an OpenGL command. At that point the CPU thread will be halted.

What I mean is that, without any form of sync, complex renderings (renderings for which one or more OpenGL command will be put in a queue) will always cause the CPU thread to be halted at some point, since as the queues will fill, the CPU will be issuing more and more commands.

In order to get a smooth (more constant) user interaction, a sync is needed (either with a platform-specific swapbuffers() or a glFinish()) so to stop the CPU from making things worse issuing more and more commands (which in turn would bring the CPU thread to a stop later)

reference: OpenGL Synchronization

1
votes

There are separate issues here, that are also a little bit connected.

1) Re-implementing double buffering yourself, while on spec the same thing, is not the same thing to the driver. Drivers are highly optimized for the common case. For example, many chips have distinct 2d and 3d units. The swap in swapBuffers is often handled by the 2d unit. Blitting a buffer is probably still done with the 3d unit.

2) glFlush (and Finish) are ignored by many drivers. Flush is a relic of client server rendering. Finish was intended for profiling. But it got abused to work around driver bugs. So now drivers often ignore it to improve the performance of legacy code that used Finish as a workaround.

3) Just don't do single buffered. There is no performance benefit and you are working off the "good" path of the driver. Window managers are super optimized for double buffered opengl.

4) What you are seeing looks a lot like you are simply leaking resources. Do you allocate buffers without freeing them? A quick and dirty way to check is if any glGen* functions return ever increasing ids.