1
votes

I've been running some tests for a contract I'm doing to improve a very old opengl application and I've been surprised to find that in 10 of the 12 computers I tried calls to glloadmatrix and calls to glmultmatrixf have almost identical speeds.

test1:
- init: nothing
- for scene: call glloadmatrixf
- for each model: glpushmatrix, gltranslate/glrotate/glscale, gldrawelements, glpopmatrix

test2:
- init: precalculate each model's private mult matrix
- for scene: call glloadmatrixf
- for each model: glpushmatrix, glmultmatrixf, gldrawelements, glpopmatrix

test3:
- init: precalculate each model's full matrix
- for scene: nothing
- for each model: call glloadmatrixf, then call gldrawelements

I'm well aware that gltranslate/glrotate/glscale are never hardware accelerated, it's written very plainly in the opengl faq, but i though glmultmatrixf wasn't either. However on most computers test case 2 and 3 described above with hundreds of models both give almost exactly the same performance (difference possibly due to the added push/pop matrix), while test case 1 is significantly slower as expected.

So question: I can't seem to find any source on the internet that says if glmultmatrix is generally hardware accelerated or not. Anyone know?

ps: upgrading this old application to newer opengl standard is outside the scope of this contract

2
it's not worth accelerating because just the upload of the 32 floats to GPU would take too long - ratchet freak

2 Answers

0
votes

what you are seeing is that draw elements calls in test2 and test3 will be the bottleneck over the matrix manipulations of test1.

Doing a just matrix multiplication is actually pretty cheap (a few dozen multiplications and additions), the biggest cost with test1 will be the glRotate which requires getting the cosine and sine of the angle you want to rotate with.

0
votes

Actually, this depends on which hardware you're asking about.

All major OpenGL implementations in the past 15 years use MMX/AltiVec/SSE/AVX matrix optimizations on the CPU-end of things (many drivers even list this in the version string). From my perspective, that is hardware acceleration - just not GPU-side.

Multiple OpenGL matrix commands can actually complete quicker than loading a pre-computed matrix from memory, I tested this extensively myself about 10 years ago. In my own tests, it was not a whole lot quicker, and with modern CPUs and the usual rendering bottleneck these days being things like fillrate rather than vertex transformation it is probably irrelevant.