none
Gaussian sample computation silently fails at certain domain size on nVidia GTX 580 w/ 3Gb VRAM on Win7x64

    Question

  • Hi, I reproduce a curious behavior with the Gaussian Blur sample app from here

    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/14/gaussian-blur-using-c-amp.aspx

    on my 3GB nVidia GTX 580, latest drivers (296.10), on Win7x64. 

    I was varying the matrix size in the sample (built with x64 bit version) and notice that after exceeding certain size the computation fails - the output data contain zeros except for a few first numbers 

    - amp_result {size = 144000000} std::vector<float,std::allocator<float> >
    [size] 144000000 __int64
    [capacity] 204324850 __int64
    [0] 0.044485215 float
    [1] 0.026021460 float
    [2] 0.0096954955 float
    [3] 0.00000000 float
    [4] 0.00000000 float
    [5] 0.00000000 float
    [6] 0.00000000 float
    ... the rest are zeroes

    It is reproducible pretty consistently, no error messages or logs are generated.

    Any idea what might be happening?

    Thanks,

    Alex.

    Friday, April 20, 2012 9:42 AM

Answers

  • Hi Alex

    That works fine on my ATI card. More importantly,  you confirmed it works fine on REF, which is the de facto correctness target. So it is an nvidia bug.

    We'll report it, thanks for bringing it to our attention.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Proposed as answer by Zhu, Weirong Monday, April 23, 2012 8:32 PM
    • Marked as answer by Saspus01 Monday, April 23, 2012 9:29 PM
    Sunday, April 22, 2012 5:52 AM
    Owner

All replies

  • Hi Alex

    Thank you for identifying the issue.

    I haven’t looked into it at all yet, but can you please confirm that you see the same (apparently incorrect) results when building in DEBUG and in RELEASE configurations?

    Also can you try, in addition to your hardware, executing the code on REF:
    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/11/direct3d-ref-accelerator-in-c-amp.aspx

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    Friday, April 20, 2012 1:55 PM
    Owner
  • Alex,

    Please also try the following (admittedly draconian) experiment:  shutdown your computer, pull the power chord for 1 minute, and then plug everything back in and restart. Then try re-running the code to see if you get correct results.

    This has worked for me in the past for one of my GTX580's.

    Please let me know the results.  Thanks!


    ++don;

    Friday, April 20, 2012 4:20 PM
    Owner
  • Hi Daniel, 

    Both Release and Debug configurations fail. I also tried immediate queuing mode just in case - fails the same way.

    Computation using reference driver succeeds (it took about 3 hours to complete though compared to less than 3 sec on CPU - did not expect it to be _that _slow :) )

    I'll try Don's suggestion later today and will update this thread.

    Thanks,

    Alex.

    Friday, April 20, 2012 7:45 PM
  • Hi Don, 

    Unfortunately draconian experiment failed as well.

    I'll try to downgrade the nvidia driver and see if that makes any difference.

    Is there any way to enable more verbose debug output from AMP (event log, etc) to shed some light on what could be going wrong? Maybe debug version of amp driver?

    At this point it is still unclear whether it is nvidia problem or MSFT

    Thanks

    Alex.

    Saturday, April 21, 2012 2:48 AM
  • Hi Alex

    The original sample reports "Verification Pass" and nothing more. Can you share the exact modifications you made to the sample, including how you are checking the results, so we can make sure we are running the same test?

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    Saturday, April 21, 2012 6:10 PM
    Owner
  • Hi Daniel, 

    The only modification necessary to reproduce the issue is to pass 12000 to the gaussian_blur constructor on line 123 (and remove assert from the constructor since the matrix size does not have to be  limited to power of two).

    I also made a change to pass accelerator_view into paraller_for_each to test immediate queuing mode. I've put the modified code here http://dl.dropbox.com/u/1496653/AMP/gaussian_blur_alex.zip However to reproduce the issue just the change of matrix size is enough.

    Thanks

    Alex.

    Saturday, April 21, 2012 7:46 PM
  • Hi Alex

    That works fine on my ATI card. More importantly,  you confirmed it works fine on REF, which is the de facto correctness target. So it is an nvidia bug.

    We'll report it, thanks for bringing it to our attention.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/

    • Proposed as answer by Zhu, Weirong Monday, April 23, 2012 8:32 PM
    • Marked as answer by Saspus01 Monday, April 23, 2012 9:29 PM
    Sunday, April 22, 2012 5:52 AM
    Owner
  • Hi Daniel, 

    Thanks, it does indeed look like nvidia related. I've also tried with the latest beta drivers (301.24) and the issue is still reproducible.

    It also fails on NVIDIA NVS 4200M with 1GB ram on my notebook (Lenovo t420, driver 296.35)

    From the other hand on two machines with AMD graphics and it passes:

    AMD FirePro V3800 w/ 512Mb, Win7x64 - PASS

    AMD Radeon HD 6470M /1Gb, win7x64 - PASS

    Thanks

    Alex

    Monday, April 23, 2012 7:42 PM