none
How to write a mutex in C++ AMP to avoid "write after write hazard"? RRS feed

  • Question

  • Greetings!

    The question is, do I need a mutex or some other construct or is a debugger issued potential write-issue-write hazard always warranted? My situation is as so that I have an output variable in C++ AMP parallel_for_each defined as follows

    some_struct out = some_struct;
    array_view<some_struct, 1> out_view(1, &out);
    
    parallel_for_each(something.extent.tile<1>(), [=](tiled_index<1>t_idx) //This just shows how the tiled_index is constructed to clarify the subsequent code snippets.
    

    Inside parallel_for_each I have a tile_static array like this

    tile_static some_struct some_values[50];

    which gets filled in some loops inside parallel_for_each. In the very end, after all the loops and applying barrier, I assign the first value from the tile_static some_values to to out_view[0] as follows

    t_idx.barrier.wait();
    if(t_idx.local[0] == 0)
    {
        if(some_values[t_idx.local[0]].some_value == out_view[0].some_value)
        {
            out_view[0] = some_values[t_idx.local[0]];
        }
    }); //End of parallel_for_each.
    
    
    

    and here I get a potential write-after-write warning. It would look like it's not real as after this assignmen the parallel_for_each has run it's course and a barrier has been applied before it. Or is there something I'm missing? If I try to move the barrier past if(t_idx.local[0] == 0) check, compiler warns "C3561: tile barrier operation found in control flow that is not tile-uniform when compiling the call graph for the concurrency::parallel_for_each at".


    Sudet ulvovat -- karavaani kulkee


    • Edited by Veikko Eeva Friday, August 10, 2012 10:16 PM
    Friday, August 10, 2012 10:14 PM

Answers

  • Hi Veksi,

    In the code above multiple “tiles” can potentially race to write to the location “out_view[0]” concurrently which is the cause of the warning in the debugger. The Visual Studio Race detection for C++ AMP blog post describe the race detection warnings in greater detail.

    You example code launches the parallel_for_each with several tiles, each one of them having one thread (“tile<1>()”). The t_idx.barrier.wait() call only synchronizes threads belonging to the same tile (in your case it is essentially a no-op since there is only one thread per tile). The code following the barrier can be concurrently executed by different tiles and since the location “out_view[0]” is not accessed using atomic operations, it constitutes a race (as indicated by the debugger warning).

    Concurrent accesses to a shared memory location (tile_static or global) need to be properly synchronized to avoid races. This can be done either through C++ AMP atomic operations or by devising a mutex using atomic operations. Note that usually it is advisable to avoid such global synchronization operations – atomics are expensive and have performance implications.

    -Amit


    Amit K Agarwal

    Friday, August 10, 2012 11:33 PM
    Moderator

All replies

  • Hi Veksi,

    In the code above multiple “tiles” can potentially race to write to the location “out_view[0]” concurrently which is the cause of the warning in the debugger. The Visual Studio Race detection for C++ AMP blog post describe the race detection warnings in greater detail.

    You example code launches the parallel_for_each with several tiles, each one of them having one thread (“tile<1>()”). The t_idx.barrier.wait() call only synchronizes threads belonging to the same tile (in your case it is essentially a no-op since there is only one thread per tile). The code following the barrier can be concurrently executed by different tiles and since the location “out_view[0]” is not accessed using atomic operations, it constitutes a race (as indicated by the debugger warning).

    Concurrent accesses to a shared memory location (tile_static or global) need to be properly synchronized to avoid races. This can be done either through C++ AMP atomic operations or by devising a mutex using atomic operations. Note that usually it is advisable to avoid such global synchronization operations – atomics are expensive and have performance implications.

    -Amit


    Amit K Agarwal

    Friday, August 10, 2012 11:33 PM
    Moderator
  • Thanks, Amit! I think this was rather illuminating.

    Sudet ulvovat -- karavaani kulkee

    Saturday, August 11, 2012 7:20 PM