0
votes

I'm struggling by the way a compute shader store an array of uint. I have the following shader code (A simple minimalist example to reproduce the problem):

cbuffer TCstParams : register(b0)
{
    int    IntValue1;
    uint   UIntArray[10];    // <== PROBLEM IS HERE
    int    IntValue2;
}

RWTexture2D<float4>                Output         : register(u0);

[numthreads(1, 1, 1)]
void CSMain()
{
    if (IntValue1 == 0)
        Output[uint2(0, 0)] = float4(1, 1, 1, 1);
}

Once compiled, I examine the ouput of the compiler to know the offet and size of the constant buffer items. The item "uint UIntArray[10];" surprisingly has a size of 148 bytes. This is strange given the fact that uint is 4 bytes. So I expect the array size to be 40 bytes.

Here is the compiler output:

Microsoft (R) Direct3D Shader Compiler 6.3.9600.16384
Copyright (C) 2013 Microsoft. All rights reserved.

//
// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384
//
//
// Buffer Definitions: 
//
// cbuffer TCstParams
// {
//
//   int IntValue1;                     // Offset:    0 Size:     4
//   uint UIntArray[10];                // Offset:   16 Size:   148 [unused]    // <== PROBLEM IS HERE
//   int IntValue2;                     // Offset:  164 Size:     4 [unused]
//
// }
//
//
// Resource Bindings:
//
// Name                                 Type  Format         Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// Output                                UAV  float4          2d    0        1
// TCstParams                        cbuffer      NA          NA    0        1
//
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
cs_5_0
dcl_globalFlags refactoringAllowed | skipOptimization
dcl_constantbuffer cb0[1], immediateIndexed
dcl_uav_typed_texture2d (float,float,float,float) u0
dcl_temps 2
dcl_thread_group 1, 1, 1

#line 13 "E:\Development\Projects\Test Projects\DirectCompute\TestShader1.hlsl"
if_z cb0[0].x
  mov r0.xyzw, l(0,0,0,0)
  itof r1.xyzw, l(1, 1, 1, 1)
  store_uav_typed u0.xyzw, r0.xyzw, r1.xyzw
endif 
ret 
// Approximately 6 instruction slots used

I checked with various array sizes and the result is very strange: the size per element is differents when the number of element change!

What am I doing wrong or what miss I? Thanks.

1

1 Answers

1
votes

Quoting from the Microsoft Docs:

Arrays are not packed in HLSL by default. To avoid forcing the shader to take on ALU overhead for offset computations, every element in an array is stored in a four-component vector.

So uint UIntArray[10]; is actually stored as a uint4 UIntArray[10];, except the last three padding uints are not included in the size calculation (even though they still count towards the offset calculation).

If you want a tighter packing, you can declare the array as uint4 UInt4Array[4]; and then cast it: static uint UInt1Array[16] = (uint[16])TCstParams.UInt4Array; (Haven't check if that code is correct, but it should be something similar). The cast itself should not cause any overhead - however, accassing elements in UInt1Array will introduce additional instructions to compute the actual offset.