Understanding the skinning part of a GLTF2.0 file for OpenGL engine

Question

I have a simple blender model which consists of three meshes with three bones controlling one mesh each. The animation is just the bones rotating the cubes a bit around the y-axis and back. The center bone is the parent of the two outer bones.

I then export this scene with the GLTF2.0 (text version) export plugin and am now trying to import this into my newly made opengl engine (c# xamarin android).

Since I want to understand the GLTF2.0 format and skeletal animation in OpenGL completely, I am trying to implement the GLTF2.0 reading myself.

I read:

Displaying the meshes was easy, but now I am stuck making the animations work. In my gltf file I see three skins:

"skins" : [
    {
        "inverseBindMatrices" : 21,
        "joints" : [
            4,
            5,
            6
        ],
        "skeleton" : 0
    },
    {
        "inverseBindMatrices" : 22,
        "joints" : [
            4,
            5,
            6
        ],
        "skeleton" : 0
    },
    {
        "inverseBindMatrices" : 23,
        "joints" : [
            4,
            5,
            6
        ],
        "skeleton" : 0
    }
]

Which confuses me, because I have one bone structure for all meshes, not three bones for each mesh. I thought I would collect all bones in class instances (say Bone.cs) with every bone having a list of children bones and a field for its parent bone. Then I would collect animations in instances (class Animation.cs) and every animation instance would have a list of key frames containing rotation, scaling and translation for a given timestamp. When the animation timestamp is then set to say 2.5 seconds, I look up the nearest two key frames for that timestamp and interpolate the rotation, scaling, translation for these key frames.

Actual questions

Why are there three skins? Why is the inverseBindMatrices bound to a skin and not to a joint?
When I have the right rotation, scaling, translation from a key frame (per bone), how do I calculate the matrices for each bone that I need to pass to my vertex shader?
Every bone node in the file has its own rotation, translation, scale values but no matrix. Why is that? Isn't there something missing?
The gltf file referse to a bone (joint) as a node id, but the weights/jointId-Arrays that get passed as attributes to the shader do not match these bone ids: jointIds-Array contains i.e. 0,1,2 for the bone ids, but the bones are in nodes 4,5,6 - how do I find the right bone for each jointId passed to the shader?

I hope you can help me. Kind regards!! I can provide more of my code if needed, but I think that if I understand the topic as a whole, I then can do it myself.

Edit

GLTF example model download

Edit #2

Alright, I think I am getting the hang of it...slowly.

For each Mesh which is controlled by the armature, there is one Skin in the file. I think there needs to be an inverse bind matrix for each mesh in order to be able to transform the mesh to bone space (and - if need be - back).
I still do not know how to calculate the final transformations correctly before passing them to the shader.
This point still eludes me.
Since every Skin has a list of three (or max. 4) Joints, these are the Joints of which the final transformations need to be passed to the vertex shader. If you have 8 Joints but the current to-be-drawn Mesh only gets affected by 4 of them, why should you pass all 8 matrices instead of only the 4 you need.

This is all still shrouded in doubt. Maybe this helps someone else.

derhass derhass · Accepted Answer · 2019-05-07T20:10:23

I'm trying to address your questions one by one

Why are there three skins? Why is the inverseBindMatrices bound to a skin and not to a joint?

As you already found out, there is one skin per mesh. THe fact that in your specific case, you could merge all three meshes into one, doesn't really restrict this general principle. However

I think there needs to be an inverse bind matrix for each mesh in order to be able to transform the mesh to bone space (and - if need be - back).

There is an inverse bind matrix for each joint of each mesh. The name of the property is inverseBindMatrices in plural for a reason, and it references a bufferview which in turn references some data in a buffer.

Changing the order of your question here, because this way it will make more sense:

Every bone node in the file has its own rotation, translation, scale values but no matrix. Why is that? Isn't there something missing?

What else do you need? Every affine transformation can be decomposed into translation, rotation and scale, so the data is complete. glTF spec defines that the resulting matrix should be calculated in the order T*R*S.

When I have the right rotation, scaling, translation from a key frame (per bone), how do I calculate the matrices for each bone that I need to pass to my vertex shader?

For each bone node i, you can calculate a local transformation as M_local(i) = T(i)*R(i)*S(i). You will get the joint matrices by applying the complete hierachy, so basically M_global(i) = M_global(parent(i)) * M_local(i) and can then construct the join matrices as M_joint(i) = inverse(globalTransform) * M_global(i) * inverseBindMatrix(i).

The gltf file referse to a bone (joint) as a node id, but the weights/jointId-Arrays that get passed as attributes to the shader do not match these bone ids: jointIds-Array contains i.e. 0,1,2 for the bone ids, but the bones are in nodes 4,5,6 - how do I find the right bone for each jointId passed to the shader?

The jointIds Array contains references to the joints, not the bones (hence the name). The skinning shader does not care about bones at all, all what the bones do is define the hierachy for the joints here, so they influence the actual values of the M_global and hence also the M_joint matrices. The i-th entry just references the i-th joint in the joints array of the respective skin, hence it needs M_joint(i).

Since every Skin has a list of three (or max. 4) Joints, these are the Joints of which the final transformations need to be passed to the vertex shader.

Why would a skin be limited to 4 joints. A skin can have as many joints a one likes.

If you have 8 Joints but the current to-be-drawn Mesh only gets affected by 4 of them, why should you pass all 8 matrices instead of only the 4 you need.

Why would you define a skin of 8 bones for a mesh which only needs four? The glTF data format does not prevent you from storing irrelevant information, or store information in an inefficient way.

The point to note here is that the hierachy between the joints is still defined by the bone node hierachy of the skeleton. So you can leave out arbitrary joints in a single skin, but these bone nodes (and a potential animation for them) can still affect the final joint matrices - for any joint defined by a bone which is below the "left out" bones in the skeleton bone hierachy.

Understanding the skinning part of a GLTF2.0 file for OpenGL engine

Actual questions

1 Answers