Note: Forums will be making significant UX changes to address key usability improvements surrounding search, discoverability and navigation. To learn more about these changes please visit the announcement which can be found HERE.
Graphics driver hang/crash when using a captured constant as a loop limit and array index in Release mode

الإجابة Graphics driver hang/crash when using a captured constant as a loop limit and array index in Release mode

  • 2012년 3월 2일 금요일 오전 1:20
     
      코드 있음

    I have a kernel that randomly selects a fixed number of elements from a source array and stores the largest in a result array (Tournament selection, which someone may be familiar with). With VS11 Beta, the code in Release mode consistently crashes the graphics driver (latest NVIDIA laptop drivers, GT540m).

    A minimal example is included below. The *_SIZE constants may be changed but make no difference (after reading this post I thought I should try it); the issue seems to be related to the count variable. It is constant (and 1) here, but the actual code allows any positive value.

    If built with Debug settings, if the for loop is removed or changed to not use count, winner is not modified or the index j is not used, there is no crash. These are all marked in the code, with commented alternative code that does not crash.

    On the plus side, the accelerator_view_removed exception is throwing correctly, and all my other code/tests work fine on Beta. (As an unrelated aside, is queuing_mode_automatic the new name for deferred and/or the default for accelerators?)

    #include <amp.h>
    #include <iostream>
    #include <vector>
    
    using namespace concurrency;
    using namespace std;
    
    #define SRC_SIZE 100
    #define RESULT_SIZE 100
    
    int main() {
        vector<float> src_init;
        for (int i = 0; i < SRC_SIZE; ++i) src_init.push_back(10.f * (i+1));
        array_view<float, 1> src(SRC_SIZE, src_init);
        //array<float, 1> src(SRC_SIZE, begin(src_init));
    
        array<float, 1> result(RESULT_SIZE);
        
        vector<float> random_init;
        for (int i = 0; i < RESULT_SIZE; ++i) random_init.push_back(0.0f);
        array_view<float, 2> random(RESULT_SIZE, 1, random_init);
        //array<float, 2> random(RESULT_SIZE, 1, begin(random_init));
    
        const int count = 1;
    
        parallel_for_each(result.extent, [count, src, &result, random](index<1> i) restrict(amp) {
            int winner = 0;
    
            for (int j = 0; j < count; ++j)    // crashes
            //int j = 0;                       // does not crash
            //for (int j = 0; j < 1; ++j)      // does not crash
            {
                winner = (int)(src.extent.size() * random(i[0], j));    // crashes
                //winner = (int)(src.extent.size() * random(i[0], 0));  // does not crash
            }
    
            result[i] = src[winner];
        });
    
        vector<float> result_data;
        try {
            copy(result, back_inserter(result_data));
            for_each(begin(result_data), end(result_data), [](float x) { cout << x << ", "; });
            cout << endl;
        } catch(accelerator_view_removed& ex) {
            cout << ex.what() << endl;
        }
        
        cout << "Press enter to quit . . .";
        cin.get();
    }


    • 편집됨 Zooba 2012년 3월 2일 금요일 오전 1:20 Fixed link to other post
    •  

모든 응답

  • 2012년 3월 2일 금요일 오전 1:23
     
      코드 있음

    Here are two snippets of my actual code. The first is the one that crashes (and is similar to the above example) while the second is how I fixed it.

            parallel_for_each(result.accelerator_view, result.grid, [&, _k, _greediness, _size](index<1> i) restrict(amp) {
                int winner = (int)(rand(i[0], 0) * _size);
    
                if (rand(i[0], _k) < _greediness) {
                    for (int j = 1; j < _k; ++j) {
                        int competitor = (int)(rand(i[0], j) * _size);
                        
                        if (src[competitor] > src[winner]) {
                            winner = competitor;
                        }
                    }
                }
    
                result[i] = src[winner];
            });
    

    The fixed version below is less efficient, since the contents of src are not integral values and I don't really want to be copying them this often.

            parallel_for_each(result.accelerator_view, result.extent, [&, _k, _greediness, _size](index<1> i) restrict(amp) {
                int winner = (int)(rand(i[0], 0) * _size);
                result[i] = src[winner];
    
                if (rand(i[0], _k) < _greediness) {
                    for (int j = 1; j < _k; ++j) {
                        int competitor = (int)(rand(i[0], j) * _size);
                        
                        if (src[competitor] > result[i]) {
                            result[i] = src[competitor];
                        }
                    }
                }
            });
    

  • 2012년 3월 2일 금요일 오전 4:38
    소유자
     
     

    Hi Zooba

    For your unrelated aside question, please read this:

    http://blogs.msdn.com/b/nativeconcurrency/archive/2011/11/23/understanding-accelerator-view-queuing-mode-in-c-amp.aspx

    For the crash, can you try using the direct3d_ref accelerator please? The easiest way to switch to REF is by setting it as default:

    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/02/default-accelerator-in-c-amp.aspx

    If it doesn't crash with REF, then this is an NVIDIA driver bug - also what is your NVIDIA driver version? Do you have an AMD device to try this on? Is this on Windows 7 or Windows 8?

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/

  • 2012년 3월 2일 금요일 오전 4:50
     
     

    Works fine with REF. It also worked fine under Dev Preview - I only changed the restrict qualifier.

    Sorry I left the other information out. I'm on Windows 7 64-bit. The NVIDIA drivers are 295.73 for notebook and the event log clearly blames the drivers ("Display driver nvlddmkm stopped responding and has successfully recovered."). I don't have any other devices that support C++ AMP.

  • 2012년 3월 2일 금요일 오전 5:48
    소유자
     
     답변됨

    Hi Zooba

    Yes, that confirms that it is an NVIDIA driver bug. We also tried this on an ATI card here and it worked fine, and also confirmed it works in REF. It also crashed on one of our nvidia cards.

    The HLSL we generate behind the scenes has been changed in some scenarios (usually for better performance in the Beta), so I am not surprised that there is difference between the DP and the Beta.

    If you have a way of reporting this to NVIDIA please do, we'll also do the same...

    Thank you for reporting this, please keep them coming.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/

    • 답변으로 제안됨 Zhu, Weirong 2012년 3월 2일 금요일 오전 5:53
    • 답변으로 표시됨 Zooba 2012년 3월 2일 금요일 오전 5:54
    •