Gaussian sample computation silently fails at certain domain size on nVidia GTX 580 w/ 3Gb VRAM on Win7x64
-
20 апреля 2012 г. 9:42
Hi, I reproduce a curious behavior with the Gaussian Blur sample app from here
http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/14/gaussian-blur-using-c-amp.aspx
on my 3GB nVidia GTX 580, latest drivers (296.10), on Win7x64.
I was varying the matrix size in the sample (built with x64 bit version) and notice that after exceeding certain size the computation fails - the output data contain zeros except for a few first numbers
- amp_result {size = 144000000} std::vector<float,std::allocator<float> >
[size] 144000000 __int64
[capacity] 204324850 __int64
[0] 0.044485215 float
[1] 0.026021460 float
[2] 0.0096954955 float
[3] 0.00000000 float
[4] 0.00000000 float
[5] 0.00000000 float
[6] 0.00000000 float
... the rest are zeroes
It is reproducible pretty consistently, no error messages or logs are generated.Any idea what might be happening?
Thanks,
Alex.
Все ответы
-
20 апреля 2012 г. 13:55Владелец
Hi Alex
Thank you for identifying the issue.
I haven’t looked into it at all yet, but can you please confirm that you see the same (apparently incorrect) results when building in DEBUG and in RELEASE configurations?
Also can you try, in addition to your hardware, executing the code on REF:
http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/11/direct3d-ref-accelerator-in-c-amp.aspxCheers
Danielhttp://www.danielmoth.com/Blog/
-
20 апреля 2012 г. 16:20Владелец
Alex,
Please also try the following (admittedly draconian) experiment: shutdown your computer, pull the power chord for 1 minute, and then plug everything back in and restart. Then try re-running the code to see if you get correct results.
This has worked for me in the past for one of my GTX580's.
Please let me know the results. Thanks!
++don;
-
20 апреля 2012 г. 19:45
Hi Daniel,
Both Release and Debug configurations fail. I also tried immediate queuing mode just in case - fails the same way.
Computation using reference driver succeeds (it took about 3 hours to complete though compared to less than 3 sec on CPU - did not expect it to be _that _slow :) )
I'll try Don's suggestion later today and will update this thread.
Thanks,
Alex.
-
21 апреля 2012 г. 2:48
Hi Don,
Unfortunately draconian experiment failed as well.
I'll try to downgrade the nvidia driver and see if that makes any difference.
Is there any way to enable more verbose debug output from AMP (event log, etc) to shed some light on what could be going wrong? Maybe debug version of amp driver?
At this point it is still unclear whether it is nvidia problem or MSFT
Thanks
Alex.
-
21 апреля 2012 г. 18:10Владелец
Hi Alex
The original sample reports "Verification Pass" and nothing more. Can you share the exact modifications you made to the sample, including how you are checking the results, so we can make sure we are running the same test?
Cheers
Danielhttp://www.danielmoth.com/Blog/
-
21 апреля 2012 г. 19:46
Hi Daniel,
The only modification necessary to reproduce the issue is to pass 12000 to the gaussian_blur constructor on line 123 (and remove assert from the constructor since the matrix size does not have to be limited to power of two).
I also made a change to pass accelerator_view into paraller_for_each to test immediate queuing mode. I've put the modified code here http://dl.dropbox.com/u/1496653/AMP/gaussian_blur_alex.zip However to reproduce the issue just the change of matrix size is enough.
Thanks
Alex.
-
22 апреля 2012 г. 5:52Владелец
Hi Alex
That works fine on my ATI card. More importantly, you confirmed it works fine on REF, which is the de facto correctness target. So it is an nvidia bug.
We'll report it, thanks for bringing it to our attention.
Cheers
Danielhttp://www.danielmoth.com/Blog/
- Предложено в качестве ответа Zhu, Weirong 23 апреля 2012 г. 20:32
- Помечено в качестве ответа Saspus01 23 апреля 2012 г. 21:29
-
23 апреля 2012 г. 19:42
Hi Daniel,
Thanks, it does indeed look like nvidia related. I've also tried with the latest beta drivers (301.24) and the issue is still reproducible.
It also fails on NVIDIA NVS 4200M with 1GB ram on my notebook (Lenovo t420, driver 296.35)
From the other hand on two machines with AMD graphics and it passes:
AMD FirePro V3800 w/ 512Mb, Win7x64 - PASS
AMD Radeon HD 6470M /1Gb, win7x64 - PASS
Thanks
Alex

