2
votes

Im fairly new to OpenGL. Ive just started learning about shaders, particularly the vertex and fragment shaders. My understanding is that when things are done through the shaders you can gain a pretty significant performance increase, because the shader runs on the GPU.

However, I've tried doing some research into this topic and I seem to be finding some mixed opinions on the matter, at least in regards to the vertex shader.

What is the major difference between rendering an object like below and using calls like glMultMatrixd for my transformations:

    glEnableClientState(GL_VERTEX_ARRAY);
    glEnableClientState(GL_NORMAL_ARRAY);

    glVertexPointer(3, GL_FLOAT, 0, &vertices[0]);
    glNormalPointer(GL_FLOAT, 0, &normals[0]);

    glDrawArrays(GL_TRIANGLES, 0, vertices.size() / 3);

    glDisableClientState(GL_VERTEX_ARRAY);
    glDisableClientState(GL_NORMAL_ARRAY);

vs using a VAO/VBO setup like below where I set my transformation matrices to Uniform variables in the shader and do the transformation there.

glBindVertexArray(vaoHandle);
glBindBuffer(GL_ARRAY_BUFFER, bufferHandle[0]);


glBufferData(GL_ARRAY_BUFFER, vertices.size() * sizeof(float), vertices.data(), GL_STATIC_DRAW);

glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(0);

glBindBuffer(GL_ARRAY_BUFFER, bufferHandle[1]);
glBufferData(GL_ARRAY_BUFFER, normals.size() * sizeof(float), normals.data(), GL_STATIC_DRAW);

glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(1);

.....

glBindVertexArray(vaoHandle);
glDrawArrays(GL_TRIANGLES, 0, vertices.size() / 3);

Just a heads up...I dont care about whats wrong with the code below. Again, I just wanna know if there is in fact a performance difference and why? Whats going on underneath the hood for both these approaches? WHy would one be faster/slower then the other? And same goes for the transformations. Why would doing one in a vertex shader with a uniform be faster then using glMultMatrix?

2

2 Answers

3
votes

What the GPU ends up executing is mostly the same for both cases on any GPU that is at least halfway recent. I don't think anybody has built GPUs that actually have dedicated hardware for the fixed pipeline in quite some time. For desktop GPUs, I believe that transition happened about 10+ years ago (for a few years before that, they were already programmable, but also still had fixed function hardware). For mobile GPUs, the transition to purely programmable GPUs happened later, but also quite some time ago.

If you use the fixed pipeline, the driver generates shader code for you, based on the fixed function state you set. So what you're really comparing are shaders that are compiled from GLSL you pass to the driver, and shaders generated by the driver based on state values.

The shader will obviously run on the GPU in both cases, so there's really not fundamental difference beyond that.

Now, you may ask: Which one is more efficient? There's no way to tell in general. Some considerations include:

  • Shaders that were generated by the driver for fixed function state can potentially have an advantage because they were heavily tuned, most likely in shader assembly. This was primarily done for workstation class GPUs, where a lot of software was using legacy fixed function OpenGL for much longer.

  • Shaders you write in GLSL have the advantage that they do exactly what you need, and nothing else. So in that sense, they may be more streamlined for your precise use case. Of course the corresponding shader generated by the driver from fixed function state could also be highly streamlined, but it's outside of your control. And especially if you care about performance on various platforms, I frankly wouldn't trust all GPU vendors to generate highly efficient shader code for me.

Of course writing your own shader code has major advantages beyond that. It allows you to do things that are simply not possible with the fixed pipeline. And even where the fixed pipeline can do the job, using shaders is often easier once you get the hang of writing GLSL code.

1
votes

The major performance difference does not come from using shaders, but from using VBOs.

In the first example, vertices and normals reside in client side memory (aka the application memory). Whenever they are drawn, these arrays are copied to the graphic card, which can take a significant time.

In contrast to this, the second example stores all relevant values in a VBO which is located in graphics memory. Thus the data is already stored in the optimal location and no copying is required for drawing.