locked
Asynchronous Walking of the .NET Managed Stack using DoStackSnapshot (sampling profiler) does not work with CLR 4.0 RRS feed

  • Question

  • I am trying to do  asynchronous stackwalks as described in this article to build a sampling profiler.
    http://msdn.microsoft.com/en-us/library/bb264782.aspx

    One of the key API's described in the article GetFunctionFromIP always fails with hr = E_FAIL(80004005); A breif code snippet of how I use it.

    			if(GetThreadContext(threadHandle, &context))
    			{
    #ifdef X64
    				FunctionID id = 0;
    				hr = m_pCorProfilerInfo->GetFunctionFromIP((BYTE*)context.Rip, &id);
    

    The DoStackSanapshot() also seems to work differently with CLR 2.0 v/s CLR 4.0 application. Usage ...

    				hr = m_pCorProfilerInfo->DoStackSnapshot(tId, StackWalkCallback, COR_PRF_SNAPSHOT_REGISTER_CONTEXT, this, (BYTE*)&context, sizeof(context));
    

    For CLR 4.0 the DoStackSnapshot always fails with hr = 80131360(CORPROF_E_STACKSNAPSHOT_UNSAFE ) whereas with CLR 2.0 succeeds on most threads but sometimes fails with hr = E_FAIL(80004005);

    I am also trying to seed the DoStackSnapshot with a unmanaged stack walk using StackWalk64 but GetFunctionFromIP always fails with hr = E_FAIL(80004005);

    					while(StackWalk64(machineType, hProcess, threadHandle, &stackFrame,
    						&context, NULL, FunctionTableAccess, SymGetModuleBase64, NULL))
    					{
    						if(stackFrame.AddrPC.Offset == stackFrame.AddrReturn.Offset)
    							break;
    					
    						hr = m_pCorProfilerInfo->GetFunctionFromIP((BYTE*)stackFrame.AddrPC.Offset, &id);
    						if(SUCCEEDED(hr) && id != 0)
    						{
    							::StringCchPrintf( buffer, numElements( buffer ), L"StackWalk64::GetFunctionFromIP(%x) or Context(%x) returned hr = %x, and id = %d", stackFrame.AddrPC.Offset, context.Rip, hr, id );
    							OutputDebugStringW(buffer);
    							hr = m_pCorProfilerInfo->DoStackSnapshot(tId, StackWalkCallback, COR_PRF_SNAPSHOT_REGISTER_CONTEXT, this, (BYTE*)&context, sizeof(context));
     
    							if(SUCCEEDED(hr))
    							{
    								::StringCchPrintf( buffer, numElements( buffer ), L"StackWalk64::DoStackSnapshot(%x) stackFrame.Addr.Offset  = %x", context.Rip, stackFrame.AddrPC.Offset );
    								OutputDebugStringW(buffer);
    								break;
    							}
    						}
    					}
    

    I also notice that the StackWalk64 function under CLR 4.0 will produce unexepect ip pointers for, I am assuming, managed code. Here is some trace output when run under CLR 4.0  Notice the ip 4th level down shows 80, 0....

    No successfull thread stacks are ever produced with CLR 4.0

    00000010 0.00073796 [13984] Starting Stackwalk for thread osID = 2E34, ManagedID = 2a2e5c0 
    00000011 0.00078685 [13984] StackWalk Start(2a2e5c0) 
    00000012 0.00081110 [13984] SuspendThread(1d0) returned suc = 0  
    00000013 0.00087692 [13984] GetFunctionFromIP(778d165a) returned hr = 80004005, err = 0,  and id = 0 
    00000014 0.00098779 [13984] DoStackSnapshot Failed with hr = 80131360 
    00000015 0.00101628 [13984] StackWalk64Ptr::GetFunctionFromIP(778d165a) or Context(778d165a) returned hr = 80004005, and id = 0 
    00000016 0.00110905 [13984] StackWalk64Ptr::GetFunctionFromIP(fe041203) or Context(fe041203) returned hr = 80004005, and id = 0 
    00000017 0.00120106 [13984] StackWalk64Ptr::GetFunctionFromIP(31e908) or Context(31e908) returned hr = 80004005, and id = 0 
    00000018 0.00129344 [13984] StackWalk64Ptr::GetFunctionFromIP(80) or Context(80) returned hr = 80004005, and id = 0 
    00000019 0.00138352 [13984] StackWalk64Ptr::GetFunctionFromIP(0) or Context(0) returned hr = 80004005, and id = 0 
    00000020 0.00147745 [13984] StackWalk64Ptr::GetFunctionFromIP(32f12d1a) or Context(32f12d1a) returned hr = 80004005, and id = 0 
    00000021 0.00157061 [13984] StackWalk64Ptr::GetFunctionFromIP(ffb3b4c0) or Context(ffb3b4c0) returned hr = 80004005, and id = 0 
    00000022 0.00166300 [13984] StackWalk64Ptr::GetFunctionFromIP(31e870) or Context(31e870) returned hr = 80004005, and id = 0 
    00000023 0.00175500 [13984] StackWalk64Ptr::GetFunctionFromIP(48) or Context(48) returned hr = 80004005, and id = 0 
    00000024 0.00184470 [13984] StackWalk64Ptr::GetFunctionFromIP(1) or Context(1) returned hr = 80004005, and id = 0 
    00000025 0.00194094 [13984] DoStackSnapshot Failed with hr = 80131360 
    00000026 0.00199368 [13984] StackWalk End 

    The same code with CLR 2.0 produces the following trace or

    00000047 0.00364166 [11920] StackWalk Start(1dff2cf0) 
    00000048 0.00368747 [11920] SuspendThread(328) returned suc = 0  
    00000049 0.00375599 [11920] GetFunctionFromIP(778d13aa) returned hr = 80004005, err = 0,  and id = 0 
    00000050 0.00382606 [11920] DoStackSnapshot Failed with hr = 80004005 
    00000051 0.00388726 [11920] StackWalk64Ptr::GetFunctionFromIP(778d13aa) or Context(778d13aa) returned hr = 80004005, and id = 0 
    00000052 0.00397080 [11920] StackWalk64Ptr::GetFunctionFromIP(fe04169d) or Context(fe04169d) returned hr = 80004005, and id = 0 
    00000053 0.00405395 [11920] StackWalk64Ptr::GetFunctionFromIP(1dff2cf0) or Context(1dff2cf0) returned hr = 80004005, and id = 0 
    00000054 0.00413748 [11920] StackWalk64Ptr::GetFunctionFromIP(f522d353) or Context(f522d353) returned hr = 80004005, and id = 0 
    00000055 0.00422063 [11920] StackWalk64Ptr::GetFunctionFromIP(fffffffe) or Context(fffffffe) returned hr = 80004005, and id = 0 
    00000056 0.00430378 [11920] StackWalk64Ptr::GetFunctionFromIP(1dff2cf0) or Context(1dff2cf0) returned hr = 80004005, and id = 0 
    00000057 0.00438732 [11920] StackWalk64Ptr::GetFunctionFromIP(1f5ff700) or Context(1f5ff700) returned hr = 80004005, and id = 0 
    00000058 0.00446200 [11920] DoStackSnapshot Failed with hr = 80004005 
    00000059 0.00450204 [11920] StackWalk End.

    The CLR 2.0 also produces a lot of successful thread stacks like

    00000037 0.00297030 [11920] Starting Stackwalk for thread osID = 2710, ManagedID = 2962de0 
    00000038 0.00301380 [11920] StackWalk Start(2962de0) 
    00000039 0.00305923 [11920] SuspendThread(3d0) returned suc = 0  
    00000040 0.00312082 [11920] GetFunctionFromIP(778d18ca) returned hr = 80004005, err = 0,  and id = 0 
    00000041 0.00323746 [11920] System.Threading.WaitHandle::FuncID(275f38) 
    00000042 0.00332677 [11920] Mercury.Util.ThreadPool.WorkerThread::FuncID(64a100) 
    00000043 0.00341069 [11920] System.Threading.ExecutionContext::FuncID(49acc8) 
    00000044 0.00349307 [11920] System.Threading.ThreadHelper::FuncID(54f7e0) 
    00000045 0.00354735 [11920] StackWalk End 

    Can anybody from the Microsoft CLR profiler API team help in trying to explain the why GetFunctionFromIP fails and if there is any effective way to do asynchronous Stack Walks in CLR 4.0 or later.

    Thanks for any responses

    -Sanjay



    Thursday, March 1, 2012 10:05 PM

Answers

  • Hi, Sanjay, sorry for the delayed response.

    GetFunctionFromIP will return E_FAIL if you pass it an IP address that doesn't reside in a managed function.  You'll get E_FAIL for any native methods, and also for IL stubs and lightweigh-code-gen methods as well.  Typically, one calls GetFunctionFromIP for methods on the stack to determine if they are managed (and if so what their FunctionIDs are), so E_FAIL is frequently returned and should be accepted and handled by your profiler.

    DoStackSnapshot on 64-bit CLR 4 does have a bug where it will return CORPROF_E_STACKSNAPSHOT_UNSAFE on a cross-thread stack-walk a lot more often than it should (this has been fixed in the CLR 4.5 beta).  On 64 bits, it's generally recommended to use the OS to walk the stack anyway, though, as it's lighter-weight than having the CLR use its full stack-walker.  Consider using RtlVirtualUnwind / RtlLookupFunctionEntry on 64 bit Windows.

    Note that, on 32-bits, DoStackSnapshot should work fine, and should only return CORPROF_E_STACKSNAPSHOT_UNSAFE when it truly is unsafe to walk the stack.

    Thanks,
    Dave

    Monday, May 7, 2012 5:23 PM

All replies

  • We are doing some research of this issue and will get back to you once we have any update information.

    Best Regards,
    Rocky Yue[MSFT]
    MSDN Community Support | Feedback to us

    Monday, March 5, 2012 9:22 AM
    Moderator
  • Hi Rocky,

    Any updates ? ETA ?

    -Sanjay


    Sanjay Mehta

    Tuesday, March 13, 2012 5:45 PM
  • Hi,

    For this question,you could  visit the below link to see the various paid support options that are available to better meet your needs if you requires a more in-depth level of support.
     

    http://support.microsoft.com/default.aspx?id=fh;en-us;offerprophone 
     
     

    Regards

    Thursday, March 29, 2012 5:49 AM
  • Hi, Sanjay, sorry for the delayed response.

    GetFunctionFromIP will return E_FAIL if you pass it an IP address that doesn't reside in a managed function.  You'll get E_FAIL for any native methods, and also for IL stubs and lightweigh-code-gen methods as well.  Typically, one calls GetFunctionFromIP for methods on the stack to determine if they are managed (and if so what their FunctionIDs are), so E_FAIL is frequently returned and should be accepted and handled by your profiler.

    DoStackSnapshot on 64-bit CLR 4 does have a bug where it will return CORPROF_E_STACKSNAPSHOT_UNSAFE on a cross-thread stack-walk a lot more often than it should (this has been fixed in the CLR 4.5 beta).  On 64 bits, it's generally recommended to use the OS to walk the stack anyway, though, as it's lighter-weight than having the CLR use its full stack-walker.  Consider using RtlVirtualUnwind / RtlLookupFunctionEntry on 64 bit Windows.

    Note that, on 32-bits, DoStackSnapshot should work fine, and should only return CORPROF_E_STACKSNAPSHOT_UNSAFE when it truly is unsafe to walk the stack.

    Thanks,
    Dave

    Monday, May 7, 2012 5:23 PM
  • did you ever get this to work ? I'm having the same issue

    Thursday, October 25, 2012 3:22 AM