locked
Graphics driver hang/crash when using a captured constant as a loop limit and array index in Release mode RRS feed

  • Question

  • I have a kernel that randomly selects a fixed number of elements from a source array and stores the largest in a result array (Tournament selection, which someone may be familiar with). With VS11 Beta, the code in Release mode consistently crashes the graphics driver (latest NVIDIA laptop drivers, GT540m).

    A minimal example is included below. The *_SIZE constants may be changed but make no difference (after reading this post I thought I should try it); the issue seems to be related to the count variable. It is constant (and 1) here, but the actual code allows any positive value.

    If built with Debug settings, if the for loop is removed or changed to not use count, winner is not modified or the index j is not used, there is no crash. These are all marked in the code, with commented alternative code that does not crash.

    On the plus side, the accelerator_view_removed exception is throwing correctly, and all my other code/tests work fine on Beta. (As an unrelated aside, is queuing_mode_automatic the new name for deferred and/or the default for accelerators?)

    #include <amp.h>
    #include <iostream>
    #include <vector>
    
    using namespace concurrency;
    using namespace std;
    
    #define SRC_SIZE 100
    #define RESULT_SIZE 100
    
    int main() {
        vector<float> src_init;
        for (int i = 0; i < SRC_SIZE; ++i) src_init.push_back(10.f * (i+1));
        array_view<float, 1> src(SRC_SIZE, src_init);
        //array<float, 1> src(SRC_SIZE, begin(src_init));
    
        array<float, 1> result(RESULT_SIZE);
        
        vector<float> random_init;
        for (int i = 0; i < RESULT_SIZE; ++i) random_init.push_back(0.0f);
        array_view<float, 2> random(RESULT_SIZE, 1, random_init);
        //array<float, 2> random(RESULT_SIZE, 1, begin(random_init));
    
        const int count = 1;
    
        parallel_for_each(result.extent, [count, src, &result, random](index<1> i) restrict(amp) {
            int winner = 0;
    
            for (int j = 0; j < count; ++j)    // crashes
            //int j = 0;                       // does not crash
            //for (int j = 0; j < 1; ++j)      // does not crash
            {
                winner = (int)(src.extent.size() * random(i[0], j));    // crashes
                //winner = (int)(src.extent.size() * random(i[0], 0));  // does not crash
            }
    
            result[i] = src[winner];
        });
    
        vector<float> result_data;
        try {
            copy(result, back_inserter(result_data));
            for_each(begin(result_data), end(result_data), [](float x) { cout << x << ", "; });
            cout << endl;
        } catch(accelerator_view_removed& ex) {
            cout << ex.what() << endl;
        }
        
        cout << "Press enter to quit . . .";
        cin.get();
    }


    Friday, March 2, 2012 1:20 AM

Answers

  • Hi Zooba

    Yes, that confirms that it is an NVIDIA driver bug. We also tried this on an ATI card here and it worked fine, and also confirmed it works in REF. It also crashed on one of our nvidia cards.

    The HLSL we generate behind the scenes has been changed in some scenarios (usually for better performance in the Beta), so I am not surprised that there is difference between the DP and the Beta.

    If you have a way of reporting this to NVIDIA please do, we'll also do the same...

    Thank you for reporting this, please keep them coming.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/

    Friday, March 2, 2012 5:48 AM

All replies

  • Here are two snippets of my actual code. The first is the one that crashes (and is similar to the above example) while the second is how I fixed it.

            parallel_for_each(result.accelerator_view, result.grid, [&, _k, _greediness, _size](index<1> i) restrict(amp) {
                int winner = (int)(rand(i[0], 0) * _size);
    
                if (rand(i[0], _k) < _greediness) {
                    for (int j = 1; j < _k; ++j) {
                        int competitor = (int)(rand(i[0], j) * _size);
                        
                        if (src[competitor] > src[winner]) {
                            winner = competitor;
                        }
                    }
                }
    
                result[i] = src[winner];
            });
    

    The fixed version below is less efficient, since the contents of src are not integral values and I don't really want to be copying them this often.

            parallel_for_each(result.accelerator_view, result.extent, [&, _k, _greediness, _size](index<1> i) restrict(amp) {
                int winner = (int)(rand(i[0], 0) * _size);
                result[i] = src[winner];
    
                if (rand(i[0], _k) < _greediness) {
                    for (int j = 1; j < _k; ++j) {
                        int competitor = (int)(rand(i[0], j) * _size);
                        
                        if (src[competitor] > result[i]) {
                            result[i] = src[competitor];
                        }
                    }
                }
            });
    

    Friday, March 2, 2012 1:23 AM
  • Hi Zooba

    For your unrelated aside question, please read this:

    http://blogs.msdn.com/b/nativeconcurrency/archive/2011/11/23/understanding-accelerator-view-queuing-mode-in-c-amp.aspx

    For the crash, can you try using the direct3d_ref accelerator please? The easiest way to switch to REF is by setting it as default:

    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/02/default-accelerator-in-c-amp.aspx

    If it doesn't crash with REF, then this is an NVIDIA driver bug - also what is your NVIDIA driver version? Do you have an AMD device to try this on? Is this on Windows 7 or Windows 8?

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/

    Friday, March 2, 2012 4:38 AM
  • Works fine with REF. It also worked fine under Dev Preview - I only changed the restrict qualifier.

    Sorry I left the other information out. I'm on Windows 7 64-bit. The NVIDIA drivers are 295.73 for notebook and the event log clearly blames the drivers ("Display driver nvlddmkm stopped responding and has successfully recovered."). I don't have any other devices that support C++ AMP.

    Friday, March 2, 2012 4:50 AM
  • Hi Zooba

    Yes, that confirms that it is an NVIDIA driver bug. We also tried this on an ATI card here and it worked fine, and also confirmed it works in REF. It also crashed on one of our nvidia cards.

    The HLSL we generate behind the scenes has been changed in some scenarios (usually for better performance in the Beta), so I am not surprised that there is difference between the DP and the Beta.

    If you have a way of reporting this to NVIDIA please do, we'll also do the same...

    Thank you for reporting this, please keep them coming.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/

    Friday, March 2, 2012 5:48 AM