1
votes

I am rendering sprites in 3d space, where each quad is formed with two triangles. I draw GL_TRIANGLES (see below). Since 2 vertices are repeated in this formation, vertex shader does two times the same computation.

    5    3, 4
     *---*
     |  /|
     |/  |
     *---* 
  1, 6    2

I wanted to optimize this by using a geometry shader to repeat the two vertices. The reason for this is that the vertex shader is expensive and there is a high number of triangles in the scene. After a lot of hackery, I managed to pull it off. It turned off to be very inefficient. It is actually 45% slower on my machine. I assume that this comes from the fact that primitive assembly is performed two times and a lot unnecessary data copying happens in the geometry shader. I can't view the assembly code so I can only guess.

Now to my question, is there a better way of doing this that would actually be faster than doing all the extra vertex shader operations.

2
Are you using indexed rendering? If you use indices, and reference the same vertex multiple times, the vertex shader result will often be cached.Reto Koradi

2 Answers

5
votes

Geometry shader is not needed for that.

What you need is indexed rendering: every vertex is stored in VBO only once. Then, you create additional buffer object (bound with GL_ELEMENT_ARRAY_BUFFER), that stores indexes of vertices stored in actual VBO.

Visualization: (source: in2gpu.com)

enter image description here

Note, that in your case is not that bad. For example, consider drawing a circle: let's say, you draw it using 360 triangles (seems reasonable). In this case, center vertex would duplicated for every triangle - that would cause 359 * 4 (number of components + alignment) * 4 (usual value of sizeof(float)) = 5744 bytes of unnecessary data:

enter image description here

Further reading:


UPDATE

Since 2 vertices are repeated in this formation, vertex shader does two times the same computation.

No, it surely does not. All repeated vertices will definitely hit vertex cache (I guess that is what you meant by "caching"?) and will be reused. This is a very common usage pattern - remember, that sometimes indexed rendering is not a solution (for example, when you have different attributes for the same position - yes, you can move position data to separate VBO, but it's usually not worth it, so let's leave that), so GPUs must handle such situations efficiently. GPU vendors took care of that.

So do not optimize that. If you are aware of indexed rendering, but you either cannot use it or it does not give any improvement, let GPU hadle rendering the best way possible.

1
votes

Since 2 vertices are repeated in this formation, vertex shader does two times the same computation.

No, on practically all existing implementations (i.e. GPUs) it does not.

The repeated vertices will hit the vertex cache and the existing results of the previous computation on the very same vertex are just reused for the following steps in the pipeline.

Trying to optimize this is a moot point, GPUs have been optimized for exactly that very usage pattern and performance wise that system has been squeezed dry.