I've been running some tests for a contract I'm doing to improve a very old opengl application and I've been surprised to find that in 10 of the 12 computers I tried calls to glloadmatrix and calls to glmultmatrixf have almost identical speeds.
test1:
- init: nothing
- for scene: call glloadmatrixf
- for each model: glpushmatrix, gltranslate/glrotate/glscale, gldrawelements, glpopmatrix
test2:
- init: precalculate each model's private mult matrix
- for scene: call glloadmatrixf
- for each model: glpushmatrix, glmultmatrixf, gldrawelements, glpopmatrix
test3:
- init: precalculate each model's full matrix
- for scene: nothing
- for each model: call glloadmatrixf, then call gldrawelements
I'm well aware that gltranslate/glrotate/glscale are never hardware accelerated, it's written very plainly in the opengl faq, but i though glmultmatrixf wasn't either. However on most computers test case 2 and 3 described above with hundreds of models both give almost exactly the same performance (difference possibly due to the added push/pop matrix), while test case 1 is significantly slower as expected.
So question: I can't seem to find any source on the internet that says if glmultmatrix is generally hardware accelerated or not. Anyone know?
ps: upgrading this old application to newer opengl standard is outside the scope of this contract