none
Gaussian sample computation silently fails at certain domain size on nVidia GTX 580 w/ 3Gb VRAM on Win7x64

    Pergunta

  • Hi, I reproduce a curious behavior with the Gaussian Blur sample app from here

    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/14/gaussian-blur-using-c-amp.aspx

    on my 3GB nVidia GTX 580, latest drivers (296.10), on Win7x64. 

    I was varying the matrix size in the sample (built with x64 bit version) and notice that after exceeding certain size the computation fails - the output data contain zeros except for a few first numbers 

    - amp_result {size = 144000000} std::vector<float,std::allocator<float> >
    [size] 144000000 __int64
    [capacity] 204324850 __int64
    [0] 0.044485215 float
    [1] 0.026021460 float
    [2] 0.0096954955 float
    [3] 0.00000000 float
    [4] 0.00000000 float
    [5] 0.00000000 float
    [6] 0.00000000 float
    ... the rest are zeroes

    It is reproducible pretty consistently, no error messages or logs are generated.

    Any idea what might be happening?

    Thanks,

    Alex.

    sexta-feira, 20 de abril de 2012 09:42

Respostas

  • Hi Alex

    That works fine on my ATI card. More importantly,  you confirmed it works fine on REF, which is the de facto correctness target. So it is an nvidia bug.

    We'll report it, thanks for bringing it to our attention.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Sugerido como Resposta Zhu, Weirong segunda-feira, 23 de abril de 2012 20:32
    • Marcado como Resposta Saspus01 segunda-feira, 23 de abril de 2012 21:29
    domingo, 22 de abril de 2012 05:52

Todas as Respostas

  • Hi Alex

    Thank you for identifying the issue.

    I haven’t looked into it at all yet, but can you please confirm that you see the same (apparently incorrect) results when building in DEBUG and in RELEASE configurations?

    Also can you try, in addition to your hardware, executing the code on REF:
    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/11/direct3d-ref-accelerator-in-c-amp.aspx

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    sexta-feira, 20 de abril de 2012 13:55
  • Alex,

    Please also try the following (admittedly draconian) experiment:  shutdown your computer, pull the power chord for 1 minute, and then plug everything back in and restart. Then try re-running the code to see if you get correct results.

    This has worked for me in the past for one of my GTX580's.

    Please let me know the results.  Thanks!


    ++don;

    sexta-feira, 20 de abril de 2012 16:20
  • Hi Daniel, 

    Both Release and Debug configurations fail. I also tried immediate queuing mode just in case - fails the same way.

    Computation using reference driver succeeds (it took about 3 hours to complete though compared to less than 3 sec on CPU - did not expect it to be _that _slow :) )

    I'll try Don's suggestion later today and will update this thread.

    Thanks,

    Alex.

    sexta-feira, 20 de abril de 2012 19:45
  • Hi Don, 

    Unfortunately draconian experiment failed as well.

    I'll try to downgrade the nvidia driver and see if that makes any difference.

    Is there any way to enable more verbose debug output from AMP (event log, etc) to shed some light on what could be going wrong? Maybe debug version of amp driver?

    At this point it is still unclear whether it is nvidia problem or MSFT

    Thanks

    Alex.

    sábado, 21 de abril de 2012 02:48
  • Hi Alex

    The original sample reports "Verification Pass" and nothing more. Can you share the exact modifications you made to the sample, including how you are checking the results, so we can make sure we are running the same test?

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    sábado, 21 de abril de 2012 18:10
  • Hi Daniel, 

    The only modification necessary to reproduce the issue is to pass 12000 to the gaussian_blur constructor on line 123 (and remove assert from the constructor since the matrix size does not have to be  limited to power of two).

    I also made a change to pass accelerator_view into paraller_for_each to test immediate queuing mode. I've put the modified code here http://dl.dropbox.com/u/1496653/AMP/gaussian_blur_alex.zip However to reproduce the issue just the change of matrix size is enough.

    Thanks

    Alex.

    sábado, 21 de abril de 2012 19:46
  • Hi Alex

    That works fine on my ATI card. More importantly,  you confirmed it works fine on REF, which is the de facto correctness target. So it is an nvidia bug.

    We'll report it, thanks for bringing it to our attention.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Sugerido como Resposta Zhu, Weirong segunda-feira, 23 de abril de 2012 20:32
    • Marcado como Resposta Saspus01 segunda-feira, 23 de abril de 2012 21:29
    domingo, 22 de abril de 2012 05:52
  • Hi Daniel, 

    Thanks, it does indeed look like nvidia related. I've also tried with the latest beta drivers (301.24) and the issue is still reproducible.

    It also fails on NVIDIA NVS 4200M with 1GB ram on my notebook (Lenovo t420, driver 296.35)

    From the other hand on two machines with AMD graphics and it passes:

    AMD FirePro V3800 w/ 512Mb, Win7x64 - PASS

    AMD Radeon HD 6470M /1Gb, win7x64 - PASS

    Thanks

    Alex

    segunda-feira, 23 de abril de 2012 19:42