3
votes

I'm working on a 2D game that uses SDL. Since some systems have weak CPUs and strong GPUs, I have a renderer backend that uses OpenGL in addition to the plain SDL/software one.

A simplified version of the renderer interface looks like this:

class Renderer {
public:
    virtual void render_surface(SDL_Surface* surface) = 0;
    virtual void render_text(const std::string& text, const Font& font) = 0;
};

But there's a problem with this: I'm losing a lot of time to repeated glBindTexture calls I need to do whenever I draw a surface using OpenGL. For now, I have a silly cache based on the surface's memory address, but it obviously doesn't work with dynamically generated surfaces, e.g. within render_text.

The only proper solution I can think of is to completely change the interface and have the caller cache textures sensibly:

class Renderer {
public:
    virtual Texture load_surface(SDL_Surface* surface) = 0;
    virtual Texture load_text(const std::string& text, const Font& font) = 0;
    virtual void render_texture(const Texture& texture) = 0;
};

But this is IMO somewhat ugly to use, and would have to be faked for the software renderer.

Is there anything else I can do about this?

1

1 Answers

7
votes

This actually sounds like two separate issues (at least your proposed solution does). I will give a couple of pointers regarding both issues since it is not entirely clear what you are trying to accomplish.


1. Redundant State Changes / Draw Calls

You can always queue up your render commands and then sort them (do not worry, sorting makes it sound more complicated/expensive than it actually is) by texture / shader / other expensive state before you actually do the drawing.

What you would really do is create different categories to put drawing commands into depending on what texture it requires, whether it is translucent or opaque, etc. and then run through the categories in a systematic fashion after you have received all of the drawing commands necessary to complete your frame. The only real sorting would occur at insertion time and because the buckets are relatively small would be far less expensive than if you tried to sort a random mess of commands when it came time to draw.

This is how high-performance game engines have worked virtually since the dawn of Quake. The idea is to minimize texture changes and draw calls as much as possible. In the old days draw calls themselves had a lot of expense (requiring vertex array memory to be copied from CPU to GPU and kernel-mode context switches in some APIs), they are still expensive these days but for different reasons. If you can combine as many order-independent draw operations as possible into single calls you will often dramatically improve performance.

In fact, PowerVR does something similar to this at the hardware level. It waits for all draw commands and then divides the screen up into tiles where it can go about determining which commands are redundant (e.g. hidden surfaces) and culls them before it has to rasterize anything. It reduces memory/power consumption as long as the draw operations are not order-dependent (e.g. alpha blended).


2. Inefficient / Non-Persistent use of GPU Storage

In the worst case, you can always consider packing your textures into an atlas. This way you do not have break draw calls apart in order to swap bound textures, you just have to compute your texture coordinates more intelligently.

To that end, you often print the same text across multiple frames in GUIs. You can easily write your software to cache rendered strings / formatted paragraphs / etc. as textures. You can extend this to entire GUI windows if you are clever, and only re-pack the portion of the texture that stores a rendered window when something in it has to be re-drawn.