locked
Mysterious bug RRS feed

  • Question

  • I am having a persistent memory-related problem when using AMP. Below is the piece of code that causes my program to crash. The code is quite simple. I am adding numbers in a particular way. It seems that the crash is caused by the command "entr2d[ix*ny_loc+iy]=sum;". More precisely, I think the crash occurs because the variable sum is corrupted. When I sat it to 1 (i.e., when I uncomment "// sum=1.0f;"), the code does not crash. The variable "sum" is computed by adding variables obtained from array "res". The code is very straightforward. I was trying to find the cause of the crash for the last two days, but failed so far. Is it possible that the bug is inside AMP somehow?

    Any help would be greatly appreciated.

    Thanks,

    Alexander

    float subr(int nx, int ny, int nz, int n1, float dx, float dy, float dz, vector <float> res_vec, float la, float mu) { 
    	float entr=0.f, dist2, temp, dx2, dy2, dz2, mu_loc, la_loc;
    	int ixl, ixr, iyl, iyr, izl, izr, n1_loc, nx_loc, ny_loc, nz_loc;
    	dx2=dx*dx; dy2=dy*dy; dz2=dz*dz; 
    		printf ("nx=%i, ny=%i, nz=%i, n1=%i \n",nx, ny, nz, n1);
    	n1_loc=n1; 
    	nx_loc=nx; 
    	ny_loc=ny; 
    	nz_loc=nz;
    	mu_loc=mu;
    	la_loc=la;
    
    // Define the accelerator
    	accelerator_view myAv = accelerator().create_view(queuing_mode_immediate);
    // Copy data to GPU
    	array_view <const float, 3> res(nx, ny, nz, res_vec);
    
    	vector <float> entr_vec(nx*ny,0.f);
    	array_view <float, 1> entr2d(nx*ny, entr_vec);
    	
    	concurrency::extent <2> ext(nx, ny);
    	parallel_for_each (myAv, ext,  [=] (index<2> idx) restrict(amp) {
    		int ix=idx[0];
    		int iy=idx[1];
    	 
    		int ixl=max(ix-n1_loc,0);
    		int ixr=min(ix+n1_loc+1,nx_loc);
    		int iyl=max(iy-n1_loc,0);
    		int iyr=min(iy+n1_loc+1,ny_loc);
    
    		float sum=0.f;
    		for(int iz = 0; iz < nz_loc; ++iz) {
    			int izl=max(iz-n1_loc,0);
    			int izr=min(iz+n1_loc+1,nz_loc);
    
    			for(int iz1 = izl; iz1 < izr; ++iz1) {
    			for(int iy1 = iyl; iy1 < iyr; ++iy1) {
    			for(int ix1 = ixl; ix1 < ixr; ++ix1) {
    
    				float temp=res[ix][iy][iz];
    				temp=temp-res[ix1][iy1][iz1];
    				sum=sum+temp;	
    
    			}}}
    		}
    //		sum=1.0f;
    		entr2d[ix*ny_loc+iy]=sum;	
    	});
    
    	entr2d.synchronize();
    
    	for(int ix = 0; ix < nx; ++ix) {
    	for(int iy = 0; iy < ny; ++iy) {
    		entr=entr+entr2d[ix*ny_loc+iy];
    	}}
    
    	return entr/float(nx*ny*nz);
    }

    Thursday, October 4, 2012 2:16 PM

Answers

  • Hi Alexander,

    DBG means debug mode, RET mean release mode. When I run (Ctrl+F5) your code in debug mode, I got

    --------------

    runtime_exception (887A0005): Failed to wait for D3D marker event.

    ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware).

    accelerator_view_removed (887A0006)

    --------------

    This is caused by your gpu code:

                                    for(int iz = 0; iz < nz; ++iz) {

                                                    ...

                                                    for(int iz1 = izl; iz1 < izr; ++iz1) {

                                                    for(int iy1 = iyl; iy1 < iyr; ++iy1) {

                                                    for(int ix1 = ixl; ix1 < ixr; ++ix1) {

                                                                    ...

                                                    }}}

                                    }

    This code takes more than 2 seconds to finish and it triggers TDR exception. For the details and solutions about TDR, please refer to this blog, http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/06/handling-tdrs-in-c-amp.aspx.

    BTW, I notice that even I uncomment "//sum=1.0f;" it still crashed.

    Both REF and WARP device are simulated graphic devices. REF is a basic CPU simulator which is available on Win7 and Win8. WARP is an advanced, faster CPU simulator which uses SSE and AVX and is only available on Win8. As your OS is Win7, WARP doesn't work on your system.

    Thanks Kevin.

    • Proposed as answer by KevinGao Thursday, October 4, 2012 11:29 PM
    • Marked as answer by Amit K Agarwal Thursday, January 3, 2013 12:41 AM
    Thursday, October 4, 2012 9:36 PM

All replies

  • Hi Alexander,

    It looks like a driver bug. May you please try:
    1. Download the latest driver.
    2. Try to use REF/WARP device to see if the crash can repro or not. To use REF device, use "accelerator_view myAv = accelerator(accelerator::direct3d_ref).create_view(queuing_mode_immediate);"  To use WARP device, (which is only on Win8), use "accelerator_view myAv = accelerator(accelerator::direct3d_warp).create_view(queuing_mode_immediate);"
    3. Add try/catch to see if you can catch any exception error messages or not. like
    try {

    } catch (std::exception &e)
    {
     std::string str = e.what(); // check str here.
    }



    • Edited by KevinGao Thursday, October 4, 2012 3:54 PM
    • Proposed as answer by KevinGao Thursday, October 4, 2012 11:30 PM
    • Unproposed as answer by KevinGao Thursday, October 4, 2012 11:30 PM
    Thursday, October 4, 2012 3:54 PM
  • Hi Kevin,

    Thank you for responding. I have downloaded the latest driver, but the problem persists. I also tried ref and warp devices. REF does not seem to crash, but it is terribly slow (I am still waiting for the program to finish). Warp does not work at all: the program aborts much earlier than before.

    Regarding item 3: I am not a c++ expert (I just code some math formulas), so this item is a bit above my level. However, from what I have seen, this does not seem to be a driver bug.

    Alexander

    Thursday, October 4, 2012 8:00 PM
  • Hi Alexander,

    REF is supposed to run very slowly as it's a simulator. Is it possible you to create minimum repro of your issue? (include main(), and how subr is called). And may you please tell me your machine info? (Win7 or Win8? If Win8, then Win8 RC or Win8 RTM? graphic card model number? driver version? DBG or RET build?)

    Thanks,

    Kevin

    Thursday, October 4, 2012 8:49 PM
  • Hi Kevin,

    Thank you. The simplified code (which still has the bug) is attached below. Here is the info about my system: Win 7 Home premium, Nvidia GE Force GTX 590 (two pieces), driver version 9.18.13.623. I am not sure what DBG and RET mean.

    Any help would be greatly appreciated!

    Alexander

    # include <string.h> # include <ppl.h> # include <stdio.h> # include <math.h> # include <float.h> # include <iostream> # include <fstream> # include <stdlib.h> # include <time.h> # include <amp.h> # include <amp_math.h> using namespace std; using namespace concurrency; float my_subr(int nx, int ny, int nz, int n1, float dx, float dy, float dz, array_view <const float, 3> res, float la, float mu, accelerator_view myAv); float int_sq(int i) restrict(cpu, amp); void main() { printf ("START\n"); int nx=512, ny=512, nz=106, n1=4; float la=1., mu=1., dx=1., dy=1., dz=1., entr; vector <float> res_vec(nx*ny*nz,0.f); for(int iz = 0; iz < nz; ++iz) { for(int iy = 0; iy < ny; ++iy) { for(int ix = 0; ix < nx; ++ix) { res_vec[ix*ny*nz+iy*nz+iz]=float(ix-2*iy+iz); }}} // Define the accelerator accelerator_view myAv = accelerator().create_view(queuing_mode_immediate); // accelerator_view myAv = accelerator(accelerator::direct3d_ref).create_view(queuing_mode_immediate); // accelerator_view myAv = accelerator(accelerator::direct3d_warp).create_view(queuing_mode_immediate); // Copy the velocity vector to the GPU array_view <const float, 3> res(nx, ny, nz, res_vec); // Computing the sum

    entr=my_subr(nx, ny, nz, n1, dx, dy, dz, res, la, mu, myAv); printf ("entr= %e \n\a\a", entr); system("pause>nul"); return; } float my_subr(int nx, int ny, int nz, int n1, float dx, float dy, float dz, array_view <const float, 3> res, float la, float mu, accelerator_view myAv) { float entr=0.f, temp, dx2, dy2, dz2; int ixl, ixr, iyl, iyr, izl, izr; dx2=dx*dx; dy2=dy*dy; dz2=dz*dz; printf ("nx=%i, ny=%i, nz=%i, n1=%i \n",nx, ny, nz, n1); vector <float> entr_vec(nx*ny,0.f); array_view <float, 1> entr2d(nx*ny, entr_vec); entr2d.discard_data(); printf ("dx2=%f, dy2=%f, dz2=%f, la_loc=%f \n", dx2, dy2, dz2, la); concurrency::extent <2> ext(nx, ny); parallel_for_each (myAv, ext, [=] (index<2> idx) restrict(amp) { int ix=idx[0]; int iy=idx[1]; int ixl=max(ix-n1,0); int ixr=min(ix+n1+1,nx); int iyl=max(iy-n1,0); int iyr=min(iy+n1+1,ny); float sum=0.f; for(int iz = 0; iz < nz; ++iz) { int izl=max(iz-n1,0); int izr=min(iz+n1+1,nz); for(int iz1 = izl; iz1 < izr; ++iz1) { for(int iy1 = iyl; iy1 < iyr; ++iy1) { for(int ix1 = ixl; ix1 < ixr; ++ix1) { float temp=res[ix][iy][iz]-res[ix1][iy1][iz1]; sum=sum+temp; }}} } // sum=1.0f; entr2d[ix*ny+iy]=sum; }); // }} // entr2d.synchronize(); for(int ix = 0; ix < nx; ++ix) { for(int iy = 0; iy < ny; ++iy) { entr=entr+entr2d[ix*ny+iy]; }} return entr/float(nx*ny*nz); } float int_sq(int i) restrict(cpu, amp) { return float(i*i); }


    Thursday, October 4, 2012 9:04 PM
  • Hi Alexander,

    DBG means debug mode, RET mean release mode. When I run (Ctrl+F5) your code in debug mode, I got

    --------------

    runtime_exception (887A0005): Failed to wait for D3D marker event.

    ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware).

    accelerator_view_removed (887A0006)

    --------------

    This is caused by your gpu code:

                                    for(int iz = 0; iz < nz; ++iz) {

                                                    ...

                                                    for(int iz1 = izl; iz1 < izr; ++iz1) {

                                                    for(int iy1 = iyl; iy1 < iyr; ++iy1) {

                                                    for(int ix1 = ixl; ix1 < ixr; ++ix1) {

                                                                    ...

                                                    }}}

                                    }

    This code takes more than 2 seconds to finish and it triggers TDR exception. For the details and solutions about TDR, please refer to this blog, http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/06/handling-tdrs-in-c-amp.aspx.

    BTW, I notice that even I uncomment "//sum=1.0f;" it still crashed.

    Both REF and WARP device are simulated graphic devices. REF is a basic CPU simulator which is available on Win7 and Win8. WARP is an advanced, faster CPU simulator which uses SSE and AVX and is only available on Win8. As your OS is Win7, WARP doesn't work on your system.

    Thanks Kevin.

    • Proposed as answer by KevinGao Thursday, October 4, 2012 11:29 PM
    • Marked as answer by Amit K Agarwal Thursday, January 3, 2013 12:41 AM
    Thursday, October 4, 2012 9:36 PM
  • Hi Kevin,

    Thanks for your help. I reduced the amount of calculations and got under 2 seconds, which eliminated the problem. I can't wait to get Win8, which would allow me to raise that limit.

    Alexander

    Thursday, October 4, 2012 10:50 PM