2
votes

I have a pixel shader, written in HLSL, that declares the following constant buffer:

cbuffer RenderParametersData : register(b2) 
{
    float4 LineColor[16];
};

In one of the shader functions, I look up the output color based on the index "color" (which is not really a color, just a convenient place to put the index into the array of LineColors):

output.Color = Colors[input.Color.b * 255];

This results in a dramatic increase in the number of instruction slots in the resulting assembly code. Keeping everything else constant, but instead performing a constant array lookup - output.Color = LineColor[0]; - the number of arithmetic operations goes from 10 to 37. Almost all of the additional operations look like this:

cmp r2, -r1.x, c0, r0.w
cmp r2, -r1.y, c1, r2
cmp r2, -r1.z, c2, r2
cmp r1, -r1.w, c3, r2

Where c increases to 15, matching the number of elements in LineColor. Resizing LineColor to 8 elements resulted in code much like the second case, but with c going only to 7, again matching the number of elements in the array. Going back to constant lookup, the number of operations dropped back to 10.

So it seems that dynamic constant buffer array lookup carries a pretty significant additional cost, adding one comparison instruction per element in the array, plus some overhead. I am genuinely surprised at how expensive this array lookup is, and given that my array size will soon increase by an order of magnitude, this will push me over the 64 arithmetic instructions limit.

Is this the expected behavior? Am I doing something wrong here, or is this a necessary consequence of dynamic array indexing?

Thanks!

EDIT: Just to add some additional detail, the effect I'm after is to color some quads based on data from the vertex shader and texture coordinates. I would do the work in the vertex shader, but interpolation of the texture coordinates has to occur first.

EDIT2: I've resolved this. I was specifying to FXC that my target is ps_4_0_level_9_1, which results in it generating assembly for both shader model 2.0 and 4.0. I discovered that the additional comparison per element problem only occurs in the model 2.0 assembly code. Switching the compiler targer to PS_4_0 results in getting only the model 4.0 code, and since I'm not constrained to level 9_1, things are now working well.

1
Just out of curiosity: Why do you use a float index and not just a dedicated integer component in your input? That would at least remove the cost from scaling the float and converting it back to an integer. Also instead of a constant buffer consider using a texture resource containing the colors.Lucius
@Lucius - If i could get an int into the shader, and do the indexing with that, I would. Do you know how? For DX9 it looks like my available semantics are either COLOR (all UNORMS) or TEXCOORD (all floats). Is there another option? Regarding your second point, if my current approach doesn't work I'll have to switch to a texture resource, I just thought a cbuffer might work better here and avoid mipping problems (like shimmering).Justin R.
"[..] arbitrary semantics are allowed which have no special meaning." from MSDN. I assume this means you can come up with custom semantics. Since I don't know what effect you are going after I can't say if a texture fits here. But I would give it a try, see how it looks performance-wise. After all they were made for looking up colors.Lucius
Thanks @Lucius, I've started to reimplement this using a texture. On the main point though, do you know if this is expected behavior when using an array in a cbuffer?Justin R.
I never have much data in my cbuffers, so I really can't tell. Does performance increase using an integer index?Lucius

1 Answers

1
votes

I resolved this by specifying that shader model 2.0 assembly should not be generated by the compiler. More details at the end of the question.