none
Why is this C++ AMP matrix multiply crashing on my GTX 560 Ti, but not crashing in "emulated" mode?

    Pergunta

  • Hi,

    Sorry if this has been answered already...

    I'm running the VS 11 Ultimate Developer Preview on Win7 x86 with an MSI GTX 560 Ti with the latest NVIDIA drivers (280.26 WHQL and 285.27 beta), and I've implemented "the actual fix" here w/ the DX June 2010 runtime. I'm running in Release mode, and having the demo matrix multiply code crash on me via P/Invoke. The same code works fine with the emulated drivers available, and the other simpler "square_array" code works fine.

    The code line in question is at the end of my function here:

     

    __declspec ( dllexport ) void _stdcall matrix_multiply_simple(float* c, float* a, float* b, int m, int n, int w)
    
    	{
    
    		array_view<const float,2> aView(m, w, &a[0]);
    
    		array_view<const float,2> bView(w, n, &b[0]);
    
    		array_view<writeonly<float>,2> cView(m, n, &c[0]);
    
    		parallel_for_each( cView.grid,  [=](index<2> idx) restrict(direct3d) 
    
    		{
    
    			int row = idx[0]; int col = idx[1];
    
    			float sum = 0.0f;
    
    			for(int i = 0; i < w; i++)
    
    				sum += aView(row, i) * bView(i, col);
    
    			cView[idx] = sum;
    
    		});
    
    	}

     

    A call to cView.synchronize() does not help. The deepest I can debug to is here (source line 1486 in amprt.h):

     

     _Event ev1 = _Src->_Copy_to_async(pTempStagingBuf, _Src_shape, pTempStagingShape);
    
    The result is that I get a "Display driver stopped working" message from Windows.

    Update 1: looks like the same code runs successfully when compiled in Debug mode.

    Thanks in advance to anyone that can help me.

    David

    Here's the full stack trace:

     

      ntdll.dll!77b16344()

      [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]

      ntdll.dll!77b15ccc()

      KernelBase.dll!75e1179c()

      kernel32.dll!761defa3()

      KernelBase.dll!75e19673()

      KernelBase.dll!75e1179c()

      ntdll.dll!77b16449()

      ntdll.dll!77b1641b()

      ntdll.dll!77af8b7d()

      ntdll.dll!77b162a7()

      KernelBase.dll!75e19673()

      KernelBase.dll!75e19673()

      KernelBase.dll!75e19673()

      vcamp110.dll!_CxxThrowException(void * pExceptionObject, const _s__ThrowInfo * pThrowInfo)  Line 162 C++

      vcamp110.dll!Concurrency::details::_D3D_throw_runtime_exception(const char * _Main_error_msg, Concurrency::details::_D3D_status _Status)  Line 53 C++

      vcamp110.dll!Concurrency::details::_D3D_accelerator_view_impl::_Map_stage_buffer(Concurrency::details::_Buffer * _Stage_buffer, _Access_mode _Map_type, bool _Wait)  Line 292 C++

      vcamp110.dll!Concurrency::details::_Buffer::_Map_stage_buffer(_Access_mode _Map_type, bool _Wait)  Line 484 C++

      vcamp110.dll!Concurrency::details::_D3D_copy_event_impl::_Wait()  Line 102 C++

      vcamp110.dll!Concurrency::details::_D3D_accelerator_view_impl::_Copy_async(Concurrency::details::_Buffer * _Src, Concurrency::details::_Reference_counted_obj_ptr<Concurrency::details::_View_shape> _Src_shape, Concurrency::details::_Buffer * _Dst, Concurrency::details::_Reference_counted_obj_ptr<Concurrency::details::_View_shape> _Dst_shape)  Line 510 C++

      vcamp110.dll!Concurrency::details::_Buffer::_Copy_to_async(Concurrency::details::_Buffer * _Dest, Concurrency::details::_Reference_counted_obj_ptr<Concurrency::details::_View_shape> _Src_shape, Concurrency::details::_Reference_counted_obj_ptr<Concurrency::details::_View_shape> _Dest_shape)  Line 519 + 0x57 bytes C++

    > vcamp110.dll!Concurrency::details::_Copy_impl(Concurrency::details::_Buffer * _Src, Concurrency::details::_Reference_counted_obj_ptr<Concurrency::details::_View_shape> _Src_shape, Concurrency::details::_Buffer * _Dst, Concurrency::details::_Reference_counted_obj_ptr<Concurrency::details::_View_shape> _Dst_shape)  Line 1486 + 0x57 bytes C++

      vcamp110.dll!Concurrency::details::_Ubiquitous_buffer::_Commit_view(_Access_mode * _Key)  Line 858 + 0x42 bytes C++

      vcamp110.dll!Concurrency::details::_Ubiquitous_buffer::_Unregister_view(_Access_mode * _Key)  Line 669 C++

      CPP-AMP.Unmanaged.dll!Concurrency::details::_Array_view_base<2,1>::~_Array_view_base<2,1>()  Line 1907 + 0x15 bytes C++

      CPP-AMP.Unmanaged.dll!matrix_multiply_simple(float * c, float * a, float * b, int m, int n, int w)  Line 68 + 0x9 bytes C++

      [Managed to Native Transition]

      CPP-AMP.Test.exe!CPPAMP.CppAmpLibTest.Main() Line 90 + 0x48 bytes C#

      ntdll.dll!77b15ccc()

      KernelBase.dll!75e1179c()

      ntdll.dll!77b15d5c()

      KernelBase.dll!75e1ca8b()

      mscoreei.dll!6f0dd7e2()

      KernelBase.dll!75e16cf2()

      KernelBase.dll!75e16d04()

      mscoree.dll!6f157efd()

      mscoree.dll!6f157f16()

      mscoree.dll!6f154de3()

      kernel32.dll!761e1114()

      ntdll.dll!77b2b429()

      ntdll.dll!77b2b3fc()

     


    • Editado David Cuccia segunda-feira, 26 de setembro de 2011 20:49 Update to debugging results
    segunda-feira, 26 de setembro de 2011 20:36

Respostas

  • Hi David,

    I just wanted to update you on this.  The good news is that I'm able to repro your issue on my end.  The other good news is that whatever is causing this to fail has been fixed, since the problem goes away on later builds of C++ AMP.  The only bad news is that I still don't know what the root cause is, and therefore don't have a workaround for you.  I'm still investigating and will update you when I have more info.


    ++don;
    • Marcado como Resposta David Cuccia sexta-feira, 21 de outubro de 2011 23:47
    sexta-feira, 21 de outubro de 2011 17:05
    Proprietário

Todas as Respostas

  • Hi David,

    We're not sure what's going on here, but perhaps you can help us figure it out by catching the exception that's being thrown and telling us what the contents are.  Just put a try-catch around your function as shown below in bold.

    Thanks.

     

    __declspec ( dllexport ) void _stdcall matrix_multiply_simple(float* c, float* a, float* b, int m, int n, int w)

    {
     try
     {
      array_view<const float,2> aView(m, w, &a[0]);
      array_view<const float,2> bView(w, n, &b[0]);
      array_view<writeonly<float>,2> cView(m, n, &c[0]);

      parallel_for_each( cView.grid,  [=](index<2> idx) restrict(direct3d)
      {
       int row = idx[0]; int col = idx[1];

       float sum = 0.0f;

       for(int i = 0; i < w; i++)
        sum += aView(row, i) * bView(i, col);

       cView[idx] = sum;
      });
     }
     catch (std::exception& ex)
     {
      cout << "Caught exception: " << ex.what() << endl;
     }

    }


    ++don;
    segunda-feira, 26 de setembro de 2011 22:53
    Proprietário
  • Hmm...looks like I can't catch it - it's throwing a "first-chance exception" from within the parallel_for_each body that resets the GPU driver:

     

    First-chance exception at 0x75e19673 in CPP-AMP.Test.exe: Microsoft C++ exception: Concurrency::runtime_exception at memory location 0x0024e5cc..

     

    I added an explicit call to cView.synchronize() at the end of the try block, and adding a breakpoint there hints that it throws the exception before the explicit call:

    • Editado David Cuccia segunda-feira, 26 de setembro de 2011 23:16 clarity
    segunda-feira, 26 de setembro de 2011 23:15
  • Ok, forgive me (I'm a little green w/ C++), but if I do put a breakpoint on the call to synchronize(), then I can over on exception "ex" and it says:

    ex = {_Mywhat=0x14ec83ec <Bad Ptr> _Mydofree=true }

    and drilling in to ex yields __vfptr:

    __vfptr = 0x8b55ff8b

    [0] = CXX0030: Error: expression cannot be evaluated

    [1] = CXX0030: Error: expression cannot be evaluated

    _Mywhat=0x14ec83ec 

    [0] = CXX0030: Error: expression cannot be evaluated

    Update: I'm confusing myself, but the above was running the release-compiled C++ dll in debug mode from C#. The following is what I get with release-release:

    ex = {_Mywhat=0x016700c4 "( g( g8gƒÍa:" _Mydofree=true }

    • Editado David Cuccia segunda-feira, 26 de setembro de 2011 23:29 update
    segunda-feira, 26 de setembro de 2011 23:26
  • Ok, so this is not so easy.  Here is what you can try so we can inspect the contents of the exception:

    1. Go to "Debug / Exceptions...".  Expand the "C++ Exceptions" and find "std::exception".  Check the box beside it in the "Thrown" column.  This will break the debugger when exceptions derived from "std::exception" are thrown.

    2. Run your program.

    3. You should get a dialog box pop up saying "First chance exception at 0xblahblah in YourProgram.exe: Microsoft C++ exception ... at memory location 0xADDRESS.

    4. Copy the address of the exception (0xADDRESS).  For example, 0x003AF78C.

    5. Open the watch window and enter the following: "(std::exception*)0x003AF78C".

    Hopefully you now have the contents of the exception variable being thrown, and can see what C++ AMP is trying to tell you by inspecting the "_Mywhat" member.

    I hope that helps, and thanks for persevering this far.


    ++don;
    terça-feira, 27 de setembro de 2011 17:05
    Proprietário
  • David, maybe you can share your entire Visual Studio solution with us so we can try to repro in house please?
    http://www.danielmoth.com/Blog/
    sábado, 15 de outubro de 2011 04:27
    Proprietário
  • Thanks for the replies, and sorry for the delay. Tried what Don suggested, and consistently getting exceptions that look like this:

    First-chance exception at 0x752b9673 in CPP-AMP.Test.exe: Microsoft C++ exception: Concurrency::runtime_exception at memory location 0x0028e7fc..

    +  (std::exception*)0x0028e7fc 0x0028e7fc {_M_error_code=-2005270523 } std::exception *

    Here's the expanded view:

    Here's a text copy:


    -  (std::exception*)0x0028e7fc 0x0028e7fc {_M_error_code=-2005270523 } std::exception *
    -  [Concurrency::runtime_exception] {_M_error_code=-2005270523 } Concurrency::runtime_exception
    -  std::exception {_Mywhat=0x0045d9a8 "Failed to map staging buffer." _Mydofree=true } std::exception
    -  __vfptr 0x0f5a98d8 const Concurrency::runtime_exception::`vftable' *
      [0] 0x0f5bdb3e Concurrency::runtime_exception::`vector deleting destructor'(unsigned int) *
      [1] 0x0f5c3cc6 std::exception::what(void) *
    -  _Mywhat 0x0045d9a8 "Failed to map staging buffer." const char *
       70 'F' const char
      _Mydofree true bool
      _M_error_code -2005270523 HRESULT
    -  __vfptr 0x0f5a98d8 const Concurrency::runtime_exception::`vftable' *
      [0] 0x0f5bdb3e Concurrency::runtime_exception::`vector deleting destructor'(unsigned int) *
      [1] 0x0f5c3cc6 std::exception::what(void) *
    -  _Mywhat 0x0045d9a8 "Failed to map staging buffer." const char *
       70 'F' const char
      _Mydofree true bool

    Daniel, happy to send you the solution file, thanks for the offer. I'll send it now...

    sábado, 15 de outubro de 2011 22:19
  • Hi David,

    I just wanted to update you on this.  The good news is that I'm able to repro your issue on my end.  The other good news is that whatever is causing this to fail has been fixed, since the problem goes away on later builds of C++ AMP.  The only bad news is that I still don't know what the root cause is, and therefore don't have a workaround for you.  I'm still investigating and will update you when I have more info.


    ++don;
    • Marcado como Resposta David Cuccia sexta-feira, 21 de outubro de 2011 23:47
    sexta-feira, 21 de outubro de 2011 17:05
    Proprietário
  • Hi Don,

    Thanks a lot for the update - sounds very promising. Is there any hint at this point as to when we might see new bits?

    David

    sexta-feira, 21 de outubro de 2011 23:46
  • Sorry, David, no hint that we can share... please keep the feedback coming...
    http://www.danielmoth.com/Blog/
    sábado, 22 de outubro de 2011 04:34
    Proprietário
  • Hi,

    I actually experienced the same problem as David with the Developers Preview (on a Zotac NVidia GT430), but after reading this thread was expecting it to be fixed for the Beta.

    However, when I try to run the matrix multiplication code now, it works fine in Debug, but throws an accelerator_view_removed exception in release mode. Catching the exception, the .what() gives me

    "Caught exception: Failed to map staging buffer."

    I am initialising the float vectors with 100 elements as 10x10 matrices.

    Thomas

    Update: After some additional experiments, it seems that as long as none of the matrix dimensions exceeded 8 elements, the code runs fine even in release mode. Anything more than that and I will get an exception. So multiplying two 8x8 matrices into an 8x8 result matrix works, but multiplying a row and column vector of 9 elements each does not work.
    Is that expected, i.e. due to some hardware restriction?

    quinta-feira, 1 de março de 2012 20:16
  • Hi Thomas

    Your problem actually seems different to the OP. Do you mind starting a new thread where you share the exact repro code? Looking forward to helping you out.

    Cheers

    Daniel


    http://www.danielmoth.com/Blog/

    • Sugerido como Resposta Zhu, Weirong sexta-feira, 2 de março de 2012 16:46
    sexta-feira, 2 de março de 2012 04:25
    Proprietário
  • Hi Daniel,

    thanks for your reply. I posted the problem in a new thread complete with sample code here.

    After reading Zooba's post about captured integrals as loop boundaries and the problems he is seeing with that on his NVidia, I suspect it might be something similar, but I am curious to know why it would work for an integral boundary less than or equal to 8 then in my case.

    Thanks,
    Thomas

    sexta-feira, 2 de março de 2012 07:34
  • Daniel,

    I too have the same problem, but with all the latest software. Your MSDN magazine article with the Matrix Multiply code is the source. If I use 1024x1024 matrix it works fine. However it's very inconsistent and blows the video driver for other values such as 10, 15, 1025 etc... However if I compile with Debug code no problem exists. I'm running with two different NVidia cards with the same result. 

    Nvidia GeForce GTX 560 Ti

    Nvidia Quadro 600

    Latest NVidia driver 301.42

    Visual 2012 RC

    Windows 7

    If I can't rely on consistent behavior for the same code this is a show-stopper in my opinion. C++ Amp has performed quite well for me up to this point, but given this problem I'm hesitant.

    I haven't seen any real response for the other posts besides contact Nvidia. Is there a better solution?

    Chris

    Here's additional information :

     Concurrency::array_view<_Value_type, _Rank>

       /// <summary>
        ///     Synchronizes any modifications made to "this' array_view to its source data. 
        /// </summary>
        void synchronize() const __CPU_ONLY
        {
            auto _Span_id = details::_Get_amp_trace()->_Start_array_view_synchronize_event_helper(_M_buffer_descriptor);

            _Buffer_ptr _PBuf;

     **** here is where it the debugger breaks **********
    -->>       _Get_access_async(_M_buffer_descriptor._Get_view_key(), _M_buffer_descriptor._Get_buffer_ptr()->_Get_master_accelerator_view(), _Read_access, _PBuf)._Get();

            details::_Get_amp_trace()->_Write_end_event(_Span_id);
        }

    If I remove the c.synchronize() then I get this:

    vcamp110.dll!Concurrency::details::_Reference_counter::_Release() Line 273 C++

    --- f:\dd\vctools\dpcxxrt\src\cpu_impl.cpp -------------------------------------
    650E8463  push        4  
    650E8465  mov         eax,65108725h  
    650E846A  call        _EH_prolog3_catch (65106ADDh)  
    650E846F  and         dword ptr [ebp-4],0  
    650E8473  test        ecx,ecx  
    650E8475  je          Concurrency::details::_CPU_accelerator_view_impl::_Release+1Ah (650E847Dh)  
    650E8477  mov         eax,dword ptr [ecx]  
    650E8479  push        1  
    650E847B  call        dword ptr [eax]  
    $LN9:
    --> 650E847D  or          dword ptr [ebp-4],0FFFFFFFFh  // this is where it breaks
    650E8481  call        _EH_epilog3 (65106A29h)  
    650E8486  ret  
    650E8487  mov         eax,650E847Dh  
    650E848C  ret  


    • Editado Norcal Dev sexta-feira, 1 de junho de 2012 18:48
    sexta-feira, 1 de junho de 2012 18:14
  • Hi Norcal Dev (and others hitting this bug)

    All programing models that use hardware in one way or another are at the mercy of driver bugs for that hardware. Luckily DirectX is a mature platform and the Windows driver certification program is rigorous so we don’t suffer as many of these as other approaches seem to.

    With C++ AMP, it is very easy to determine if you are hitting a driver bug by testing the code on REF (plus on hardware from another vendor):
    http://blogs.msdn.com/b/nativeconcurrency/archive/2012/03/11/direct3d-ref-accelerator-in-c-amp.aspx

    All driver bugs reported to us in this forum have been reported to the hardware vendor, so there is nothing additional for you to do in terms of getting the bug fixed. By the time C++ AMP ships, we hope the hardware vendor will have fixed all known bugs.

    For your specific driver bug, there is a very long and noisy thread above, and there are a few different issues being discussed above, so I cannot be sure what your specific bug is. Can you confirm that your experience is identical to the bug reported in this much cleaner forum thread:
    http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/28c4933d-df19-4aa2-93b8-9f9cc4e85a7a

    If it is, since this has been hit by 3 people already, we have up’d the priority of the bug we reported to nvidia, and we have also posted a workaround in the forum thread I just linked - please check that it works for you.

    Cheers
    Daniel


    http://www.danielmoth.com/Blog/


    sábado, 2 de junho de 2012 04:39
    Proprietário