Performance of GLSL geometry shaders unexpectedly slow

Question

I'm trying to learn how to program GLSL Geometry shaders. My test project works like this: I have N VBO's which are modeling "blades of grass". Without the shader, each blade of grass is just basically a line strip with 20 segments. I was able to get this animating more or less smoothly with almost N=10k blades so that's 200,000 lines.

The shader takes each line segment and blows it out to a cylinder of the same length centered on that line segment, so the blades of grass are now tubes with dimensionality. So nothing has changed in the CPU, but now I'm trying to leverage the GPU to add more geometry so I can shade the blades. The cylinder has 30 sections so that's 60 triangles, 1200 triangles per blade.

The thing is, to get it to animate smoothly I had to scale back to only 25 blades. That's only 30k triangles, which is basically LESS geometry than I was dealing with before when I wasn't using shaders at all.

This is running on a Macbook Pro, Snow Leopard, AMD Radeon HD 6750M. No idea if that's a good card or not.

The shader code is pretty simple -- the vertex shader just has gl_Position = gl_Vertex. Lighting is happening in the geometry shader: simple ambient, specular and diffuse components, basically straight out of the tutorials. Fragment shader is similarly simplistic, just multiplies the grass color by the light intensity that was passed over from the geometry shader.

This is an old version of OpenGL by the way, 2.1 -- so it's GLSL 1.2, so to use the geo shader it needs the GL_EXT thingy. In case that's relevant.

Also, the stack is Processing on top of GLGraphics on top of JOGL on top of Java. I'd be surprised if that was a factor, unless somehow it's emulating the shader code on the CPU but I didn't think OpenGL did that kind of thing automatically for you.

Anyway, do these numbers seem reasonable, or am I doing something wrong? Am I unrealistically expecting geo shaders to work miracles?

I'd try this code on a card that is sure to run the GS on hardware as opposed to the driver emulating that functionality. Perhaps some new Fermi-class hardware? I have a strong feeling it will work great :) — Ani
@ananthonline: Are you saying that AMD's HD-class hardware doesn't have hardware geometry shaders? Because that's not true. — Nicol Bolas
Not really sure, hence the "try" bit. Also, it might be bad drivers or something else. The only way to make sure is to try in on some other card+driver combination that works well for someone else. Correct? — Ani

Nicol Bolas Nicol Bolas · Accepted Answer · 2012-03-12T17:28:40

No one has ever accused Geometry Shaders of being fast. Especially when increasing the size of geometry.

Your GS is taking a line and not only doing a 30x amplification of vertex data, but also doing lighting computations on each of those new vertices. That's not going to be terribly fast, in large part due to a lack of parallelism. Each GS invocation has to do 60 lighting computations, rather than having 60 separate vertex shader invocations doing 60 lighting computations in parallel.

You're basically creating a giant bottleneck in your geometry shader.

It would probably be faster to put the lighting stuff in the fragment shader (yes, really).

Performance of GLSL geometry shaders unexpectedly slow

1 Answers