locked
HLSL Branching Performance

    Question

  • I'm aware there are efficiency issues surrounding the use of branching instructions in HLSL, but I'm having trouble finding coding guidelines covering the use of branching. So, I would like to know which of the following three HLSL code snippets is the most efficient:

    // 1:
    //
    result = (a >= b) ? c : d;
    
    // 2:
    //
    if (a >= b)
       result = c;
    else
       result = d;
    
    // 3:
    //
    x = step(a, b);
    result = (x * c) + ((1 - x) * d);

    Assume the underlying hardware supports dynamic flow control. Also, if you happen to know of a resource that provides information on this subject that would be great too.
    Thursday, November 8, 2012 5:54 AM

Answers

  • The first 2 are basically equivalent.

    Most GPU performance questions are very specific to the hardware design, so vendor and Feature Level matter a great deal here. For feature level 10.0 and up, there's a number of recent performance presentations on DirectX 11 that can help explain the trade-offs. f you are mostly interested in Feature Level 9.x implications, you should see older presentations on Direct3D 9 and Shader Model 2.0 / 3.0 generation hardware.

    Feature Level 9.1/9.2 probably should use option #3. Feature level 9.3 and up are probably better done as branching (#1 or #2). The real question is if all the pixels in a neighbhorhood tend to all pick the branch the same way or not. That has the most impact on performance.

    Tuesday, November 13, 2012 6:48 PM