1
votes

What are successful strategies to optimize HLSL shader code in terms of computational complexity (meaning: minimizing runtime of the shader)?

I guess one way would be to minimize the number of arithmetic operations that result from compiling the shader.

How could this be done a) manually and b) using automated tools (if existing) ?

Collection of manual techniques (Updated)

  • Avoid branching (But how to do that best?)
  • Whenever possible: precompute outside shader and pass as argument.

An example code would be:

float2 DisplacementScroll;

// Parameter that limit the water effect
float glowHeight;
float limitTop;
float limitTopWater; 
float limitLeft;
float limitRight;
float limitBottom;

sampler TextureSampler : register(s0); // Original color
sampler DisplacementSampler : register(s1); // Displacement

float fadeoutWidth = 0.05;

// External rumble displacement
int enableRumble;
float displacementX;
float displacementY;

float screenZoom;

float4 main(float4 color : COLOR0, float2 texCoord : TEXCOORD0) : COLOR0
{

// Calculate minimal distance to next border
float dx = min(texCoord.x - limitLeft, limitRight - texCoord.x);
float dy = min(texCoord.y - limitTop, limitBottom - texCoord.y);

///////////////////////////////////////////////////////////////////////////////////////
// RUMBLE                                                        //////////////////////
///////////////////////////////////////////////////////////////////////////////////////

    if (enableRumble!=0)
    {
    // Limit rumble strength by distance to HLSL-active region (think map)
    // The factor of 100 is chosen by hand and controls slope with which dimfactor goes to 1
    float dimfactor = clamp(100.0f * min(dx, dy), 0, 1); // Maximum is 1.0 (do not amplify)

    // Shift texture coordinates by rumble
    texCoord.x += displacementX * dimfactor * screenZoom;
    texCoord.y += displacementY * dimfactor * screenZoom;
    }

//////////////////////////////////////////////////////////////////////////////////////////
// Water refraction (optical distortion) and water like-color tint  //////////////////////
//////////////////////////////////////////////////////////////////////////////////////////

if (dx >= 0)
{
float dyWater = min(texCoord.y - limitTopWater, limitBottom - texCoord.y);

  if (dyWater >= 0)
  {
    // Look up the amount of displacement from texture
    float2 displacement = tex2D(DisplacementSampler, DisplacementScroll + texCoord / 3);

    float finalFactor = min(dx,dyWater) / fadeoutWidth;
    if (finalFactor > 1) finalFactor = 1;

    // Apply displacement by water refraction
    texCoord.x += (displacement.x * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom; // Why these strange numbers ?
    texCoord.y += (displacement.y * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom;

    // Look up the texture color of the original underwater pixel.
    color = tex2D(TextureSampler, texCoord);

    // Additional color transformation (blue shift)
    color.r = color.r - 0.1f;
    color.g = color.g - 0.1f;
    color.b = color.b + 0.3f;

  }
  else if (dyWater > -glowHeight)
  {
   // No water distortion...
   color = tex2D(TextureSampler, texCoord);

   // Scales from 0 (upper glow limit) ... 1 (near water surface)
   float glowFactor = 1 - (dyWater / -glowHeight); 

   // ... but bluish glow
   // Additional color transformation
   color.r = color.r - (glowFactor * 0.1); // 24 = 1/(30f/720f); // Prelim: depends on screen resolution, must fit to value in HLSL Update
   color.g = color.g - (glowFactor * 0.1);
   color.b = color.b + (glowFactor * 0.3);
  }
  else
  {
  // Return original color (no water distortion above and below)
  color = tex2D(TextureSampler, texCoord);
  }
}
else
{
// Return original color (no water distortion left or right)
color = tex2D(TextureSampler, texCoord);
}

   return color;
}

technique Refraction
{
    pass Pass0
    {
        PixelShader = compile ps_2_0 main();
    }
}
2

2 Answers

1
votes

I'm not very familar with the HLSL internals, but from what I've learned from GLSL is: never branch something. It probably will execute both parts and then decide which result of them should be valid.

Also have a look at this and this.

As far as I know there are no automatic tools except the compiler itself. For very low level optimization you can use fxc with the /Fc parameter to get the assembly listing. The possible assembly instructions are listed here. One low level optimization which is worth mentioning is MAD: multiply and add. This may not be optimized to a MAD operation (I'm not sure, just try it out yourself):

a *= b;
a += c;

but this should be optimized to a MAD:

a = (a * b) + c;
1
votes

You can optimize your code using mathematical techniques that involve manipulation functions, would be something like:

// Shift texture coordinates by rumble
texCoord.x += displacementX * dimfactor * screenZoom;
texCoord.y += displacementY * dimfactor * screenZoom;

Here you multiply three values​​, but only one of them comes from a register of the GPU, the other two are constants, you could pre multiply and store in a global constant.

// Shift texture coordinates by rumble
texCoord.x += dimfactor * pre_zoom_dispx; // displacementX * screenZoom
texCoord.y += dimfactor * pre_zoom_dispy; // displacementY * screenZoom

Another example:

// Apply displacement by water refraction
texCoord.x += (displacement.x * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom; // Why     these strange numbers ?
texCoord.y += (displacement.y * 0.2 - 0.15) * finalFactor * 0.15 * screenZoom;

 0.15 * screenZoom <- can be optimized by one global.

The HLSL Compiler of Visual Studio 2012 have a option in poperties to enable Optimizations. But the best optimization that you can make is write the HLSL code simple as possible and using the Intrinsic functions http://msdn.microsoft.com/en-us/library/windows/desktop/ff471376(v=vs.85).aspx Those functions are like memcpy of C, using assembly code in body that uses resources of system like 128-bit registers (yes, CPU have 128-bit registers http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) and strongly fast operations.