HLSL 5.0 float1x3 vs float3x1 constant buffer packing rule

Question

I'm currently trying to get my head around constant buffer packing rules in HLSL 5.0 and D3D11. So I played a little with fxc.exe:

// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.18773
//
//
// Buffer Definitions:
//
// cbuffer testbuffer
// {
//
//   float foo;                         // Offset:    0 Size:     4
//   float3x1 bar;                      // Offset:    4 Size:    12 [unused]
//
// }
//
//
// Resource Bindings:
//
// Name                                 Type  Format         Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// testbuffer                        cbuffer      NA          NA    0        1

So far everything behaves like I expect it to. The float3x1 is 12 bytes in size and can therefore be placed in the first 16 byte slot since the variable before is 4 bytes in size. After changing the float3x1 to float1x3 the compiler output now looks like this:

// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.18773
//
//
// Buffer Definitions:
//
// cbuffer testbuffer
// {
//
//   float foo;                         // Offset:    0 Size:     4
//   float1x3 bar;                      // Offset:   16 Size:    36 [unused]
//
// }
//
//
// Resource Bindings:
//
// Name                                 Type  Format         Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// testbuffer                        cbuffer      NA          NA    0        1

So it seems that the HLSL compiler suddenly gives every float in the float1x3 its own 16 byte slot which is quite wasteful. I googled a lot to understand this behavior but couldn't find anything. I hope some of you guys can explain this to me since this behavior really confuses me.

Good question. The main issue seems to be that the compiler assigns 9 floats to the float1x3 when only 3 are used. 9 is a quite weird number for that matter (btw, it is not like each entry has its own 16 bytes; this would equal 48 bytes). — Nico Schertler
You are right. Only the first two entries get their own 16 bytes - the last one is still 4 bytes hence the total size of 36 bytes. — user1655806

GaleRazorwind GaleRazorwind · Accepted Answer · 2018-11-17T19:25:52

This answer is conjecture based on my understanding of HLSL, which uses column major matrix packing by default. The registers in HLSL are made up of sets of four 4-byte sections for a total of 16 bytes per register. Each register then acts as a single row with four columns.

When you declare a float3x1, you are declaring a matrix with 3 columns and one row. This fits neatly into HLSL's method of register packing where a single row can contain 16 bytes.

When you declare a float1x3, you are declaring a matrix with one column and three rows. Because of the way HLSL handles register packing, it has to spread the data across 3 sets of registers and reserves the space of a 3x3 matrix.

If you need a 1xX matrix, you are better off declaring a vector instead which will automatically fit within a single register and can be used in any situation either a 1x3 or a 3x1 matrix could be.

HLSL 5.0 float1x3 vs float3x1 constant buffer packing rule

1 Answers