Visual C++ Developer Center > Visual C++ Forums > Visual C++ General > Walking the call stack as fast as possible
Ask a questionAsk a question
 

AnswerWalking the call stack as fast as possible

  • Thursday, May 11, 2006 7:28 PMIulian Radu Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hello !

    I have dedicated some time to developing a memory allocator that replaces the default crt allocator. Things are almost finished, but recently i've added a leak trace that saves the call stack for every memory allocation. (The stack is saved in the process memory, so there's no interprocess communication overhead.) That adds a very important overhead to the allocator and i was wondering if there is any efficient way of getting the call stack. At the moment i'm using StackWalk64 and the profiler i'm using (CodeAnalist) reports that 99+% of the time is spent in dbghelp.dll, ntoskrnl.dll and msvcrxx.dll (strangely, the functions used from here are wcslen and its family even though i don't use them in my code). Even more strange is the fact that only 5-6 % of the time spent in dbghelp.dll is used by StackWalk64 and the bulk of it is used by MiniDumpWriteDump.
    That being said, is there any faster way of getting the call stack ? I don't expect the code to be
    as fast as without the trace, but i think a 20-50x slowdown would be better than this. Now the speed of the allocator has decreased by 3 orders of magnitude. (By the way, i'm not getting a full stack, just the first 10 frames)

Answers

  • Thursday, May 11, 2006 11:05 PMMilis Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    It depends actually. If the function frames are nicely stored in the stack  and easy to find then yes it is pretty fast. But we all know that is not the case espeacially when the application is an optimized one. Still it should give you better performance then using StackWalk64.

    Unfortunately there's no good example yet in the product on how to do stack walking using DIA SDK. I was hopping to get one in time for shipping but I was to late. The documentation should be enough though given that you know already a lot of stuff from working with dbghelp api's wich is very similar to how DIA SDK stack walker works.

    Thanks,

    Milis

    VC++ team

All Replies

  • Thursday, May 11, 2006 7:47 PMAyman Shoukry - MSFTModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Have you tried using the DIA tools to walk the stack? (http://msdn2.microsoft.com/en-us/library/108e9y6d.aspx)

    Thanks,
    Ayman Shoukry
    VC++ Team
  • Thursday, May 11, 2006 8:12 PMIulian Radu Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
     Ayman Shoukry wrote:

    Have you tried using the DIA tools to walk the stack? (http://msdn2.microsoft.com/en-us/library/108e9y6d.aspx)

    Thanks,
    Ayman Shoukry
    VC++ Team


    I have just found about the DIA SDK a couple of hours ago. Is it faster ?
  • Thursday, May 11, 2006 8:18 PMAyman Shoukry - MSFTModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    To be honest, I don't have much experience myself with DIA (it is probably worth a try) but I will forward the issue to one of the folks on our team who should know more than myself :-)

    Thanks,
    Ayman Shoukry
    VC++ Team
  • Thursday, May 11, 2006 11:05 PMMilis Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    It depends actually. If the function frames are nicely stored in the stack  and easy to find then yes it is pretty fast. But we all know that is not the case espeacially when the application is an optimized one. Still it should give you better performance then using StackWalk64.

    Unfortunately there's no good example yet in the product on how to do stack walking using DIA SDK. I was hopping to get one in time for shipping but I was to late. The documentation should be enough though given that you know already a lot of stuff from working with dbghelp api's wich is very similar to how DIA SDK stack walker works.

    Thanks,

    Milis

    VC++ team

  • Friday, May 12, 2006 7:22 AMIulian Radu Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thank you very much. I'll give it a try.
  • Monday, January 22, 2007 3:45 PMKasperWessing Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi Iulian,

    Did you managed to create a working stack walk with DIA, We like to do the same thing as you, but I find it a bit unclear how I have to create the implementation of IDiaStackWalkHelper and how I finally create the IDiaStackWalkHelper object, so If you or anyone else (i.e. VC++ team at Microsoft) already has done this I like to use it as sample code.

    Thx in advance,
     Kasper

  • Monday, January 22, 2007 4:47 PMIulian Radu Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
     KasperWessing wrote:

    Hi Iulian,

    Did you managed to create a working stack walk with DIA, We like to do the same thing as you, but I find it a bit unclear how I have to create the implementation of IDiaStackWalkHelper and how I finally create the IDiaStackWalkHelper object, so If you or anyone else (i.e. VC++ team at Microsoft) already has done this I like to use it as sample code.

    Thx in advance,
    Kasper



     Hello !

     I haven't tried using it since i didn't expect a magical increase in speed. I've fiddled with /GH /Gh switches and added _penter and _pexit functions that keep track of the stack. Actually, i've only been using /Gh since VC++ 2005 had a bug in placing _pexit properly in all cases. They fixed that in the patch and i recommend you to use the /GH switch too since the trick that I've used to emulate the call to my _pexit handler made the stack untraversable during debugging.

     The _penter/_pexit solution I've implemented only slows the program by 5-10% i think(versus 1-2000% with the WinDBG). Of course, different calling behaviors have different performance penalties, but it's much better than the all purpose solution provided by DIA or WinDBG. The main limitation is that you can't use a plugin system since it requires recompilation and you can only trace your executables.

    I hope you can use _penter /_pexit to solve your problem.

    Sincerely,
     Iulian Radu
  • Friday, January 26, 2007 12:44 PMKasperWessing Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi Iulian,

    It works like charme,

    Thx, Kasper

  • Thursday, May 10, 2007 3:49 PMdosler Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi there, Iulian!

    you could benefit from the undocumented (at least not for winxp) ntdll!RtlCaptureStackBackTrace

    there's some info on the windows 2003 server implementation in msdn but i found it equally applicable to winxp. It is a low-overhead stack capture routine which works with restrictions : it does not walk FPO-optimized frames (by design) and it does not use symbols (also by design - that's where you get the speed benefit). Once the stack traces have been taken they can be instrumented with symbols at any convenient point later on (using dbghelp or DIA). This is how the debugging page heap functionality is implemented in the system itself (these snapshots must be fast!)

     

    Hope it is still helpfull!

     

    dmitri.

  • Tuesday, December 18, 2007 5:42 PMYCY Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Does anyone ever use the DIA to walk stack with a module created with FPO (frame pointer omission)? Does it work?