Safe to avoid atomics when performing order-invariant math? RRS feed

  • Question

  • Hi. I actually have two questions:

    1) I'd like to have a variable for summing elements of an array that's been copied to a tile_static array for use by the tile threads. I do not want to use a location in the tile_static array for this purpose. Can I just declare a single tile_static variable that is visible to all the threads in the tile as in "tile_static int sum;"?

    2) My second question is related to the first. Each tile thread is adding and/or subtracting variables to the "sum" variable. Since the final value of "sum" is invariant to the order in which these operations occur (the additions and subtractions can take place in any order and the final value will be the same), and assuming I use "tidx.barrier.wait()" to guarantee all the threads have completed before accessing the value of "sum", can I safely omit the use of atomics to perform the addition/subtraction?

    Thank you in advance.


    Thursday, November 15, 2012 11:38 PM


  • Hi,

    For "1": Yes, you can have tile_static scalar variables.

    For "2": The addition/subtraction operations will need to be atomic for correctness. An addition/subtraction operation comprises a load, arithmetic add/subtract and a store and for correct results it is necessary that these be performed atomically since if other thread’s (from the same tile) execution interleaves between the load and the store, you could end up with wrong results.

    Consider a situation where a tile_static variable is initialized to “0” and 2 threads are concurrently trying to increment it by “1” and their execution interleaves as follows:

    thread1_local_x = load tile_static var;   // (thread1_local_x = 0)

    thread2_local_x = load tile_static var;   // (thread2_local_x = 0)

    thread1_local_x = thread1_local_x + 1;    // (thread1_local_x = 1)

    thread2_local_x = thread2_local_x + 1;    // (thread2_local_x = 1)

    store thread1_local_x to tile_static var; // (tile_static var = 1)

    store thread2_local_x to tile_static var; // (tile_static var = 1) Incorrect result (the increment by the other thread is lost)


    Amit K Agarwal

    • Proposed as answer by Zhu, Weirong Friday, November 16, 2012 6:04 AM
    • Marked as answer by LKeene Friday, November 16, 2012 6:36 PM
    Friday, November 16, 2012 2:05 AM