7
votes

If I'm doing instanced rendering and need to send one mat4 per instance to the vertex shader, which approach is likely to be faster for large numbers of instances?

  1. Using an instanced mat4 attribute (glVertexAttribDivisor) and sending the mat4s into the VBO each frame (glBufferData)
  2. Using an array of mat4 uniforms in a uniform block, updating the array every frame using a uniform buffer object and accessing the appropriate mat4 using gl_InstanceID as the array index
1
The only way to know for sure is to profile. But considering this is exactly the use case glVertexAttribDivisor was designed to address, I'd expect option 1 to be more efficient. It's neater, too.user269597
@robinjam - I agree profiling is the only way to know for sure, but I think whilst glVertexAttribDivisor is a perfect fit for large number of instances who's positions don't change (e.g. forest of trees), I think it's a closer call when their transforms have to be updated every frame. Setting uniforms might be quicker than updating a buffer.GuyRT

1 Answers

10
votes

Based on comments/answers from robinjam, GuyRT and Brett Hale I did some testing. The test rendered 40000 instances of the same mesh (a triangle) updating each instance's model matrix every frame. My GPU is a GeForce GTX 460 SE.

Here are my results:

  • mat4 uniforms (updated via glUniformMatrix4fv) with 254 instances per draw call (limited due to uniform limits) = 160 fps

  • mat4 uniforms in a block (updated via a UBO) with 254 instances per draw call (limited due to uniform limits) = 260 fps

  • mat4 attributes (updated via a VBO) with 40000 instances per draw call = 287 fps