Matrix math on CPU or GPU for common 3D operations

Question

Is there any common wisdom about how much matrix math should be done on the CPU vs the GPU for common 3D operations?

A typical 3D shader potentially needs several matrices. A world matrix for computing surface to light calculations. A world inverse transpose matrix for normal calculations. A world view projection matrix for 3d projection. Etc.

There are 2 basic ways to approach this.

Calculate the matrices on the CPU and upload the computed matrices to the GPU

In some CPU language
```
worldViewProjection = world * view * projection
worldInverseTranspose = transpose(inverse(world));

upload world, worldViewProjection, worldInverseProjection to GPU
```
on GPU use world,worldViewProjection, worldInverseProjection where needed.
Pass the various component matrices to the GPU (world, view, projection) and compute the needed matrices on the GPU

In some CPU language
```
upload world, view, projection to GPU
```
On GPU
```
worldViewProjection = world * view * projection
worldInverseTranspose = transpose(inverse(world));
```

I understand that at some level I probably just have to profile on the different machines and GPUs and that drawing a million vertices in 1 draw call might have different needs than drawing 4 vertices in 1 draw call but ... I'm wondering ...

Is there any common wisdom about when to do math on the GPU vs CPU for matrix calculations.

Another way to ask this question is what should my default be #1 or #2 above after which later I can profile for those cases where the default is not the best performance.

It's generally not that much of a bottleneck, because a single matrix (if provided premultiplied) can serve millions of points, but sending prepared stuff to the shaders should be lot more effective. Additionally, if you have literally thousands of instances to render, then the most cost effective method would be Hardware Instancing and a single premultiplied projection*view matrix. It's also good practice to keep all calculations in the same space. For example, provide view X proj, and world separately. Provide light coordinates directly in view X proj X world, etc. — Dimo Markov
When doing on GPU you'll do it for every single vertex, pixel, ... again and again. CPU calculates it once. For me i do all that precalculation on CPU. — kaiser

Swifter Swifter · Accepted Answer · 2016-03-07T23:01:13

When deciding on CPU / GPU compute, the issue is not calculation, but streaming.

GPU calculation is very cheap. As your calculation world * view * projection involves uniforms, it's likely that this will be optimised.

However, if you choose to calculate on the GPU, then world view and projection have to be streamed as individual uniform matrices. This takes more time than streaming a single matrix, and also uses up more uniform components within your shader.

Note that the streaming time for uniforms is minimal when compared to texture data or buffer data. You're unlikely to hit performance limits because of it, and if you do then it's easy to optimise.

Matrix math on CPU or GPU for common 3D operations

1 Answers