none
C++ AMP bug with for loop

    Întrebare

  • This code, which would seem to be fine, crashes via an unresponsive driver exception with an NVIDIA card when compiled and run in Release mode.  The problem is the innocuous looking for-loop within the kernel ("for (int k = i + 1; k < N; ++k) a[j * N + k] = j;").  The value of "j * N + k" should be fine.  If I unroll the loop manually, it works fine. If I compile and run the program in Debug mode, it works fine.  If I add a guard before the "a[...] = j" assignment (i.e., "if (j * N + k < N*N) ...", then it works fine.  My guess is that the code generated is wrong. (Run on Windows 7, Developer Preview of MSVC++ 11.)

    BTW, Is there any compiler options to look at the HLSL generated?

     

     

    #include <stdio.h>
    #include <stddef.h>
    #include <malloc.h>
    #include <amp.h>
    #include <sys/timeb.h>
    #include <iostream>
    #include <stdlib.h>
    #include <string.h>
    
    using namespace concurrency;
    using namespace std;
    
    template <typename Kernel>
    void parallel_for_each(accelerator_view & acc, int ext_size, Kernel kernel)
    {
        auto krn = [=] (index<1> idx) restrict(direct3d)
        {
            kernel(idx[0]);
        };
        concurrency::parallel_for_each(acc, grid<1>(concurrency::extent<1>(ext_size)), krn);
    }
    
    void xxx(accelerator_view & acc)
    {
        int w[] = { -1, -1, -1, -1, -1, -1, -1, -1, -1 };
        int N = 3;
    	int * z = (int*)malloc(N*N * sizeof(int));
        array<int, 1> a(N * N, acc);
        copy(w, a);
        for (int i = 0; i < N; i++)
        {
            parallel_for_each(acc, N, [=, &a](int j) restrict(direct3d)
            {
    #ifndef INLINE
                for (int k = i + 1; k < N; ++k)
    	            a[j * N + k] = j;
    #else
                    a[j * N + 1] = j;
                    a[j * N + 2] = j;
    #endif
            });
            acc.flush();
    	copy(a, z);
    	for (int p = 0; p < N*N; ++p)
    		std::cout << " " << z[p];
    	std::cout << "\n";
            break;
        }
    }
    
    #define MYSIZE 2000
    char buffer[MYSIZE];
    CHAR* wtoc(const WCHAR* Source)
    {
        for (int j = 0; j < MYSIZE; ++j)
            buffer[j] = 0;
        int i = 0;
        while(Source[i] != '\0')
        {
            buffer[i] = (CHAR)Source[i];
            ++i;
            if (i > 2000)
                break;
        }
        return buffer;
    }
    
    int main()
    {
        int n = 3;
    #define TYPE int
    
        std::vector<concurrency::accelerator> accelerators = concurrency::get_accelerators();
        std::vector<concurrency::accelerator>::iterator it;
        for (it = accelerators.begin(); it != accelerators.end(); ++it)
        {
            std::cout << "has disp " << (*it).get_has_display() << "\n";
            std::cout << "mem " << (*it).get_dedicated_memory() << "\n";
            std::cout << "dev " << wtoc((*it).get_device_path().c_str()) << "\n";
            std::cout << "dev " << wtoc((*it).get_description().c_str()) << "\n";
    
            if (strcmp(wtoc((*it).get_description().c_str()), "Software Adapter") == 0)
                continue;
            if (strcmp(wtoc((*it).get_description().c_str()), "CPU accelerator") == 0)
                continue;
            
            accelerator_view acc = it->create_view();
    
            xxx(acc);
        }
        return 0;
    }
    


    UPDATE: The code runs fine on another machine (Windows 8, AMD Llano). Might be a driver problem?

     

     


    • Editat de Ken Domino 1 februarie 2012 22:54 See update.
    1 februarie 2012 21:57

Răspunsuri

  • Thanks Ken

    One of my colleagues also reproed this on a GTX 480, so there is consistency here. We'll add it to our bug report list for the IHV and if you have your own mechanism for reporting these bugs to NVIDIA, please go ahead and do that too.

    Thanks for reporting it to us.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/
    • Marcat ca răspuns de Ken Domino 2 februarie 2012 11:48
    2 februarie 2012 04:31

Toate mesajele

  • Hi Ken

    Can you also try on the REF accelerator (Software Adapter)? That is the correctness accelerator that everything else is measured against.

    If you have a piece of code that doesn't fail on REF (and you already said it doesn't fail on hardware from another vendor), then you are right that would point to a driver bug for the hardware that it is failing on (and you should ensure you have their latest public driver).

    I tried your code on our latest bits, and this code still causes a TDR on NVIDIA Quadro 2000M hardware, so I am leaning towards your conclusion.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/
    2 februarie 2012 01:47
  • The code runs fine on the "Software Adapter" accelerator.  I updated the driver (now win7 64bit 286.19), but the problem is still there for that accelerator (NVIDIA GTX 470). --Ken

    2 februarie 2012 03:03
  • Thanks Ken

    One of my colleagues also reproed this on a GTX 480, so there is consistency here. We'll add it to our bug report list for the IHV and if you have your own mechanism for reporting these bugs to NVIDIA, please go ahead and do that too.

    Thanks for reporting it to us.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/
    • Marcat ca răspuns de Ken Domino 2 februarie 2012 11:48
    2 februarie 2012 04:31