none
Efficiently range checking a variable in AMP? RRS feed

  • Question

  • Hello all. I currently have a brute force convolution-type kernel running in AMP. It does something like the following:

    Value1 += DataBuffer(row, column)       * Term0;
    Value1 += DataBuffer(row, column + 1) * Term2;
    Value1 += DataBuffer(row, column + 2) * Term2;
    Value1 += DataBuffer(row, column + 3) * Term3;

    The above sequence is then repeated another 3 or 4 times (generally) to achieve the desired result. I'd now like to add some operations to the above sequence, specifically I'd like to check the "DataBuffer" values to see if they exceed a specified threshold and assign a value depending on the result. So the first line above would be modified something like this (all the other lines would also be modified in the same manner):

    Value1 += DataBuffer(row, column) * Term0;
    if(DataBuffer(row, column) > SomeThreshold)
    {
        Value2 += Constant1 * Term0;
    }
    else
    {
        Value2 += Constant2 * Term0;
    }

    I'm unsure of the performance penalty for the conditional branch statement since each thread will be invoking it 16 or 20 times. Can someone please suggest an efficient way to do the "if-else" portion, or comment on whether they think it will represent a performance issue? I'd like to get some feedback from the community before going through the considerable effort of adding this functionality to see the hit I'm going to take regarding performance. Thanks in advance everyone.

    -L

    Wednesday, November 12, 2014 8:58 PM

Answers

  • Hello LKeene - interesting question. I presume that row and col vary per lane (thread), so basically each element that you're running it over would have different coordinates. Assuming that is true, we have to consider a few things:

    a) you're branching based on a value that comes straight from main memory - for something like GPUs this is the most expensive type of branching;

    b) it's possible that the values in DataBuffer are not very coherent, so the probability of all elements choosing the same branch is not necessarily high;

    c) that being said, you only have two symmetrical cases, so even in the presence of notable incoherency the worst case you take (conceptually) two cycles as opposed to one (one per branch).

    Considering this particular use case, you can re-cast it to be branch free in the following way:

    bool case_0 = DataBuffer(row, column) > SomeThreshold;
    auto multiplier = case_0 * Constant1 + !case_0 * Constant2;
    Value2 += multiplier * Term0;
    Of course, there are multiple ways of expressing the above, but this is just an idea (note that based on the type of datum that you are playing it you might need to cast case_0). Hope this helps!

    • Marked as answer by LKeene Monday, November 24, 2014 2:24 AM
    Friday, November 14, 2014 3:10 PM

All replies

  • Hello LKeene - interesting question. I presume that row and col vary per lane (thread), so basically each element that you're running it over would have different coordinates. Assuming that is true, we have to consider a few things:

    a) you're branching based on a value that comes straight from main memory - for something like GPUs this is the most expensive type of branching;

    b) it's possible that the values in DataBuffer are not very coherent, so the probability of all elements choosing the same branch is not necessarily high;

    c) that being said, you only have two symmetrical cases, so even in the presence of notable incoherency the worst case you take (conceptually) two cycles as opposed to one (one per branch).

    Considering this particular use case, you can re-cast it to be branch free in the following way:

    bool case_0 = DataBuffer(row, column) > SomeThreshold;
    auto multiplier = case_0 * Constant1 + !case_0 * Constant2;
    Value2 += multiplier * Term0;
    Of course, there are multiple ways of expressing the above, but this is just an idea (note that based on the type of datum that you are playing it you might need to cast case_0). Hope this helps!

    • Marked as answer by LKeene Monday, November 24, 2014 2:24 AM
    Friday, November 14, 2014 3:10 PM
  • Thanks for that helpful reply Alex!

    -L

    Friday, November 14, 2014 5:02 PM
  • I'm certainly not a C++ AMP expert, but I think using the clamp() function would do the trick without any branching. See http://msdn.microsoft.com/en-us/library/hh308289.aspx for a description of this function.
    Wednesday, December 3, 2014 10:10 PM