none
WPR and managed call stacks

    Pergunta

  • Hi,

    I have noticed that I can get managed call stacks with the newest version part of the Windows 8 SDK. If I want to use this one to find bottlenecks of my application which I am developing with VS it would be nice to get full call stacks. When I create a small sample app with a decent nested call stack like

      class Program
        {
            [MethodImpl(MethodImplOptions.NoInlining)]
            static void Main(string[] args)
            {
               
                var sw = Stopwatch.StartNew();
                while (true)
                    F1();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F1()
            {
                F2();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F2()
            {
                F3();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F3()
            {
                F4();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F4()
            {
                F5();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F5()
            {
                F6();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F6()
            {
                F7();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F7()
            {
                F8();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F8()
            {
                F9();
            }
     
            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void F9()
            {
                Console.WriteLine("hi");
            }

    I do get only the currenlty executing managed method back. In this case F9

    Line # NewProcess NewThreadId NewThreadStack ReadyingProcess ReadyingThreadId Count Ready (us) Ready (us) Waits (us) Waits (us) Count:Waits SwitchInTime (s) NewInPri OldProcess OldThreadId OldOutPri OldInSwitchTime (us) OldState OldWaitReason OldWaitMode Cpu IdealCpu % CPU Usage
    2 11128 [Root] 17680 107939,480 195,067 4261686,154 112927,851 17390 1642470,502 0,41
    3  |- FastEventLogReader.exe!FastEventLogReader.Program::F9 17630 107009,457 195,067 3833861,204 3667,100 17354 1623974,498 0,39
    4  |- ntdll.dll!RtlUserThreadStart 36 646,254 106,749 394412,491 112927,851 25 15579,220 0,02
    5  |- ntdll.dll!LdrInitializeThunk 8 135,164 59,519 289,911 155,900 8 1221,086 0,00
    6  |- FastEventLogReader.exe!FastEventLogReader.Program::Main 1 47,615 47,615 4630,913 4630,913 1 1473,368 0,00
    7  |- ?!? 2 17,280 10,752 28488,947 28488,947 1 112,125 0,00
    8  |- FastEventLogReader.exe!FastEventLogReader.Program::F4 1 23,423 23,423 0,000 0,000 0 28,031 0,00
    9  |- FastEventLogReader.exe!FastEventLogReader.Program::F3 1 3,840 3,840 0,000 0,000 0 2,304 0,00
    10  |- ntkrnlmp.exe!KiStartUserThreadReturn 1 56,447 56,447 2,688 2,688 1 79,870 0,00

    I am not sure how to deal with this seemingly broken call stack. It does look like while sampling the stacks I do get an extra node for each uniquely executing method. When I am at F3 then a new node is created. When it is in F9 (most of the time) I do get a lot of counts there but I am missing the parent methods. Is there a way to see a little more of the managed call stacks as well? One stack frame is nice but some more would help in many cases.

    Do I need to modify the trace buffer sizes or buffer counts to see more. Or has this to do something with the CLR Rundown event provider to enable mixed stack walks. I am using .NET 4.0 x64 on Windows 7 on a 8 Core Xeon 2.8 GHz with 12 GB RAM.

    Yours,

      Alois Kraus

    quinta-feira, 20 de dezembro de 2012 14:19

Todas as Respostas

  •  

    It is a known issue that for X64 Processes before windows 8 (server 2012), the ETW stack crawling logic stops at the first frame whose code was dynamically generated (that is Just in time compiled).   This issue is fixed in Windows 8.  

    You can work around the problem by

    • Running the app as a 32 bit application
    • NGENing the code you care about.
    • Run on Windows 8

    There is a whole section on this in the PerfView users guide that goes into these mitigations (you can get PerfVIew from http://www.microsoft.com/en-us/download/details.aspx?id=28567)  See the FAQ or 'BROKEN stacks'.    These mitigations will work for WPR too.  (in general, WPR and PerfView can use each other's data). 

    quarta-feira, 2 de janeiro de 2013 23:58
  • Thanks Vance,

    I will try this out when I get back to work. Is there any chance to get a fix in Windows 7 as well? If it is a small thing I could try to open a business case to get it. Do you know which OS component would need to be patched? Since it is fixed in Windows 8 it should be easy to backport? This tool is immensely useful but only if the call stacks are not broken. When I want to check if a specific build does work as expected I do not want to NGen it every time since this is rather time consuming for our product. (~2h on my 8 Core machine). 32 bit is also not an option in my case since it does process large amounts of data.

    Yours,

      Alois Kraus

    quinta-feira, 3 de janeiro de 2013 13:05