How to optimize performance using instancing in a scene full of skinned-mesh?

Question

I’m working on a web tower defense game based on three.js. Now I’m stuck at optimizing the performance.

My game loads two GLTF models as enemy and tower, both have skinned-mesh. When the player creates a tower or the game spawns an enemy, I use THREE.AnimationUtils.clone to clone the loaded model. Then I add this cloned model to the scene. For animation, I use THREE.AnimationObjectGroup to animate all the enemies.

This results in an average of 370 draw-calls per frame in the performance test with the scene loaded with 45 towers and 70 enemies, which is a nightmare for the game.

I think maybe using instancing can optimize the performance because every tower and enemy share the same model and state in each frame, but only rotation and position are different. But after I studied some examples using instancing, there is no example using instancing with skinned-mesh. (There is a discussion here, but the result here doesn't mention any method with instancing.)

Is there any chance that this can be done with three.js, or some other solution for this situation?

Update

After researched more I found some concepts maybe can help me to implement instancing with skinned-mesh.

Concept

The original post here implement skinned-mesh with instancing in Unity. (It's written in Chinese, I translated the main concept in the following.)

After loaded a skinned-mesh, it has an initial state with all vertices (for clarity, each initial vertex denote as PLT in the following). In any frame of the animation, the final position of PLT (denote as PI) equals to a series of matrix multiplication PI = (M_rootlocal * ... * M_2_3 * M_1_2 * M_bind_1 * PLT) + (M_rootlocal * ... * M_2_3 * M_1_2 * M_bind_2 * PLT) + (...)

M_bind_1 is the bone-binding matrix of bone 1.
M_m_n means the transformation of bone m relative to it's initial state under the coordinate system of bone n.

For simplify, use M_f_i = M_rootlocal * ... * M_2_3 * M_1_2 * M_bind_i to represent the transformation. M_f_i means bone-binding matrix of bone i after multiplication at frame f, so PI = (M_f_1 * PLT) + (M_f_2 * PLT) + (...) Once we know M_f_i, we can calculate the position of every vertex in frame f.

The process above can be done inside GPU by passing M_f_i which wrap as a texture. (Under the premise that the skinned-mesh needs to animate around 10 animations and less amount of bones, the require memory is about 0.75Mb.). Finally, we can pass different frame number f to each instance to render skinned-mesh with animation in one draw-call.

Implement with three.js

I haven't build an example code yet because I don't know the concept can work on WebGL or not (also I'm not familiar with GLSL), but I think the way to implement it with three.js can done as the following.

Follow here to get M_f_i.
Use THREE.InstancedBufferGeometry and THREE.RawShaderMaterial.
- In uniforms pass initial geometry, M_f_i and texture.
- In vertexShader process PI = (M_f_1 * PLT) + (M_f_2 * PLT) + (...).
- In fragmentShader process texture (I have no idea how to do it).
Pass f and other instance attribute using THREE.InstancedBufferAttribute.

Problem

Where is M_f and how to get it by THREE.AnimationClip in step 1?
How to index each PLT (vertex in geometry)?
How to deal with texture?
How to deal with hierarchy Object3D (Object3D.children have THREE.Mesh and THREE.SkinnedMesh at the same time)?

I need someone to tell me this idea works in three.js or not, and how to solve the problem above.

Screenshot of the performance test

Analysis using Chrome dev tool

juagicre juagicre · Accepted Answer · 2018-12-27T16:28:46

I remember there was a Geometry or Mesh Merge function that was really helping me in the past with such a cases. I recommend you search in that direction.

There are many counterparts to its usage such as loosing the individuality of each 3d object you use but when possible you should use it for static elements like environment objects, in other cases it may be also useful if your individual objects/towers are based on many single 3d objects in the way that they become just one...

From my experience (could vary a lot depending on each computer and size of 3d viewport) at the end you should never have more than 50 (simple) 3d objects in front of your visible camera area and reuse all materials, geometries, mesh... otherwise you'll end up having a very poor performance as soon as something fun is happening in your game.

Hope it helps!