Optimizing OpenGL rendering

Question

I ran into a poor performance problem when rendering to OpenGL using Assimp. The scene has 367727 triangles. At the same time, 298084 is a model of chess. I do not think that the problem is in shaders, because:

128x128 window: 44 (43.7657) FPS, 22.849 ms

256x256 window: 42 (40.9563) FPS, 24.4162 ms

512x512 window: 35 (34.8007) FPS, 28.7351 ms

1024x1024 window: 22 (21.084) FPS, 47.4293 ms

But if I don’t draw chess, then in the window 1366x763: 55 (54.8424) FPS, 18.2341 ms

Also, changing the resolution of the shadow map does not greatly affect the FPS.

There are 2 point light sources on the scene, with FPS loss when drawing this model for each of the passes ~10 FPS (from 23 to 55). Ie, there is no difference where I draw this model: in the depth map or in the "color texture". Losses ~ are the same. I load the model with the following parameters: aiProcess_Triangulate | aiProcess_CalcTangentSpace | aiProcess_JoinIdenticalVertices

And render as follows:

inline void pointer(GLint location, int count, GLuint buffer) {
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    glVertexAttribPointer(
        location, // attribute location
        count,    // count (1, 2, 3 or 4)
        GL_FLOAT, // type
        GL_FALSE, // is normalized?
        0,        // step
        nullptr   // offset
    );
}

inline void pointerui(GLint location, int count, GLuint buffer) {
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    glVertexAttribIPointer(
        location, // attribute location
        count,    // count (1, 2, 3 or 4)
        GL_UNSIGNED_INT, // type
        0,        // step
        nullptr   // offset
    );
}
...
pointer(cs.inPosition, 3, model.shape->meshes[i].getVerticesBuffer());
pointer(cs.inNormal, 3, model.shape->meshes[i].getNormalsBuffer());
pointer(cs.inTangent, 3, model.shape->meshes[i].getTangentsBuffer());
pointer(cs.inBitangent, 3, model.shape->meshes[i].getBitangentsBuffer());
pointer(cs.inTexCoord, 2, model.shape->meshes[i].getTexCoordsBuffer());

if (model.shape->bonesPerVertex != 0) {
    pointer(cs.inBoneWeights, 4, model.shape->meshes[i].getBoneWeightsBuffer());
    pointerui(cs.inBoneIds, 4, model.shape->meshes[i].getBoneIdsBuffer());
}

modelMatrix = &model.transformation;
updateMatrices();

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, model.shape->meshes[i].getIndicesBuffer());
glDrawElements(GL_TRIANGLES, model.shape->meshes[i].indices.size(), GL_UNSIGNED_INT, nullptr)

Here is the scene itself:

EDIT: vertex_shader.glsl, fragment_shader.glsl

I apologize that the fragment shader is difficult to read, I have not yet fully completed work on it

My GPU is NVGF 920mx

EDIT: here is capture from renderdoc

@YesThatIsMyName, yes, but it gave an increase of only 3 FPS — congard
To improve the question measure time in ms ONLY. Post your shader code, both vertex and fragment. GPU spec wouldn't hurt - if your computer is is wooden the numbers make sense ;-) — Andreas
There sure is alot going on in that shader, and your GPU is low end anno 2015. The for 20 do texture lookup is a potential culprit, but can't be sure without measuring. Note: textureCubeShadow sampler is dedicated for shadow mapping, so I'm pretty sure you should be using that instead. Not familiar with it personally though. — Andreas

MaKo MaKo · Accepted Answer · 2019-05-10T19:17:57

I know that this answer is not direct answer to your question, and based on the details you gave, you probably already know at least most of these, but I will anyways put it here in case someone else finds your question and is looking for possible solutions.

In most of these kinds of cases, it is hard to give exact answer, but some general ideas to look for solution.

You know, using shaders is anyways creating an illusion of something real. If you strive for accuracy, you would anyways turn your eye towards raytracing and other more realistic ways to render images. As long as you work in real time rendering, and with shaders, it is all about cheating the eye to make the scene look realistic, even if it is not. That's why you should look for cheap tricks to make your scene to look more realistic than what it really is.

The most effective way to improve your performance is just draw less. If you can, try to bake your high-poly meshes to low-poly ones with normal mapping. It is very common way in almost all projects aiming to real-time performance, that is, higher FPS ratings. In general, if you strive for high details and accuracy, you will need lots of processing power, but if you can make some novel compromises to retain sort-of-feel-of-details, you can improve performance. Drawing like half a million vertices with 50 FPS (that is, like 25 million vertices per second) can be just too much to your GPU.

This same holds true with your shaders. Are you sure you are using all the lights you have configured your shader? If you have e.g. three lights, you can greatly improve performance, if your shader can manage just three lights. Remember, that the number of lights is fragment-specific constant(*): you don't need to think how many lights your scene has, it is all up to how many lights is taken in to account per fragment pixel (that is, lights are processed in fragment shader).

(*) Well, it is probably a model specific constant, because even that all that matters is how many lights are used per fragment, it is probably hard to send lights per fragment - it is easier to send lights per model for rendering.

And as a general rule, it is close always good to move computations to vertex shader, let it to precompute values for your fragment shader. For example, you could consider switching your fragment shader to use camera space, in which case you can do TBN computations entirely in vertex shader.

There are already some nice ideas in the question's comment field you could consider to get better performance.

In the case you see that it is the memory bottleneck limiting your performance, there is old but still nice article about the issue: https://www.khronos.org/opengl/wiki/Vertex_Specification_Best_Practices

In such cases, what you can do, is first to compress your vertex attributes. For example, you may well convert TBN matrix to GL_INT_2_10_10_10 -type (if available), which will squeeze the matrix from 9 x float to 3 x float. It will loose precision, but mostly that will not cause any visible effects. You could even go that far, that you send TBN as a quaternion, which will squeeze 9 x floats to just one float, if the precision is enough to you.

Furthermore, you could try interleaved buffers (forming VBOs from array of structs). It is unsure if it has any effect, and even if it has, it is very much GPU-related. With some GPUs it may improve the performance of vertex cache.

But anyways, if you need to go down to details like those, at best the performance benefit is usually small (at least when considering somewhat simple vertex attribute scheme you are using), and at some point you just need to accept that your GPU is not able to process the data you have in the time you want. There are hardware related limits to the maximum performance you can get.

I know this does not help you much, but I still hope you get some ideas where to look for possible solutions for your problem.

Optimizing OpenGL rendering

1 Answers