none
CLR Memory. RRS feed

  • Question

  • This has been a lingering problem for sometime now and I would like to make one more appeal for enlightenment.

    The scenario is that I have a WCF application that is hosted in a Windows Service. I monitor its memory usage using the 'Task Manager' (Windows 2003 Server) and after about 3 hours of CPU time the 'Mem Usage' on the process climbs to about 2Gb (this takes about 18 hours of clock time) and I start to get exceptions thrown that indicate the memory is corrupt. If I restart the service (hence the WCF application) it essentially resets the process and I am good for another 3 hours of CPU time. I have tried various leak detection and either because of my lack of experience on the tools use or using the wrong tool I have not come to any conclusions. I have tried windbg with SOS but attempts to show all of the heap objects leave me overwhelmed with the huge quantity of output. Like !dumpheap -stat shows alot of objects and I am not sure which types I should pursue and I am not entirely clear on the correct arguments to supply to !dumpobj.

    Next I moved to DebugDiag and let it run for a while and it reported that mscorwks was leaking memory which was not helpful since I cannot change the CLR itself.

    I was encouraged to monitor 'Private Bytes' and '# Bytes in all Heaps' with the Performance Monitor. After about an hour of monitoring it seemed that the CLR "# Bytes in all Heaps' seemed to increase while 'Private Bytes' remained relatively constant.

    Since I have been unsucessful at finding a leak I have moved on to trying to eliminate that possibility of memory fragmentation. How can I rule out fragmentation? What tools would be best to determine this? Can memory be fragmented in the CLR heaps? It would seem that if the heap undergoes GC it would preclude the possibility of fragmentation.

    Any tips on what you have found successfull would be greatly appreciated.

    Thank you.

    Kevin
    Friday, December 4, 2009 9:24 PM

Answers

All replies

  • The managed heap is not prone to fragmentation since the garbage collector is free to move stuff around, if you want to know the source of your memory try using a profiler like redgate ants or scitech's memory profiler

    Friday, December 4, 2009 9:48 PM
  • Hi Kevin,

    I'm afraid there is no silver bullet for your problem. Tools alone will not help you.

    You'll need a fine understanding of how the GC works and it's highly probable that you will need to change your code to fit the working patterns of the GC too. My personal point of start, when I have problems like this, is the CLR Profiler. There are some introductory materials on the web, even some videos. You should have a look at the way your application allocates objects: do you allocate small chunks of data or large ones, how often do you allocate, what is the lifetime of the allocated objects? Working with the CLR profiler could help you understand, if your application has issues in the pattern of allocation. Ray M_ is right on the issue of heap fragmentation. But while the GC takes care of defragmentation, this very action of relocating objects in managed memory to compress the heap could cause other problems, making your application behave sluggish. So my suggestion: take a step back from your code, let the CLR Profiler analyze the heap, sit back and think about what the diagrams show to you.

    Marcel
    Sunday, December 6, 2009 4:03 PM
  • Hello

    Apart from the excellent suggestions from Marcel and Ray, please also take a look at the steps that I summarized for diagnosing memory leaks:
    http://social.msdn.microsoft.com/Forums/en-US/vclanguage/thread/c6235e14-a204-4f7c-bf32-6e6e85274b80

    You mentioned that "# Bytes in all Heaps' seemed to increase while 'Private Bytes' remained relatively constant. It's weird that "Private Bytes" remain constant. If it's a memory leak, Private Bytes must be increasing. Could you please check the "Scale" of the "Private Bytes" counter? Please set the scale of bother counters to be the same.

    CLR profiler is a very useful tool for analyzing mananged memory leak. Besides it, dump analysis could be helpful too. You can follow the article http://support.microsoft.com/default.aspx/kb/286350 to capture the dump file of the application. After you get the dump, please let me know your email address by sending a mail to jialge@microsoft.com. Then I will create a file transfer workspace where you can upload your dump file. The dump will be kept confidential, and I will try to analyze it for you.




    Regards,
    Jialiang Ge
    MSDN Subscriber Support in Forum
    If you have any feedback of our support, please contact msdnmg@microsoft.com.
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Monday, December 7, 2009 6:06 AM
    Moderator
  • Hi Kevin,

    I'm afraid there is no silver bullet for your problem. Tools alone will not help you.

    You'll need a fine understanding of how the GC works and it's highly probable that you will need to change your code to fit the working patterns of the GC too. My personal point of start, when I have problems like this, is the CLR Profiler. There are some introductory materials on the web, even some videos. You should have a look at the way your application allocates objects: do you allocate small chunks of data or large ones, how often do you allocate, what is the lifetime of the allocated objects? Working with the CLR profiler could help you understand, if your application has issues in the pattern of allocation. Ray M_ is right on the issue of heap fragmentation. But while the GC takes care of defragmentation, this very action of relocating objects in managed memory to compress the heap could cause other problems, making your application behave sluggish. So my suggestion: take a step back from your code, let the CLR Profiler analyze the heap, sit back and think about what the diagrams show to you.

    Marcel

    THanks for the tip. Unfortunately as I see it the CLRProfiler cannot attach to a process. This is a problem because the only time we see a problem is when the application is under stress in production. The "leak" manifests itself in less than 24 hours when in production. We cannot simulate the load that the application is under in development so in order to use this tool we would need to bring the service down in production, start it up again under CLRProfiler, and wait for errors. I am not sure how this tool will affect "normal" performance of the application. If it slows it down or otherwise affects it then we would need a period of 24 hours where we could alert potential customers that thier transaction may behave strangely. This is clearly not an option. If we could invasively debug this application then we could do it in a development environment but like I said not knowing the nature of the cause of the "leak" we have been unable to reproduce it in such an environment. Also since the load is considerably less in the development envronment I am unclear how long the profiler should run before adequate information is available.

    Kevin
    Monday, December 7, 2009 6:16 PM
  • THanks for the tip. Unfortunately as I see it the CLRProfiler cannot attach to a process. This is a problem because the only time we see a problem is when the application is under stress in production. The "leak" manifests itself in less than 24 hours when in production. We cannot simulate the load that the application is under in development so in order to use this tool we would need to bring the service down in production, start it up again under CLRProfiler, and wait for errors. I am not sure how this tool will affect "normal" performance of the application. If it slows it down or otherwise affects it then we would need a period of 24 hours where we could alert potential customers that thier transaction may behave strangely. This is clearly not an option. If we could invasively debug this application then we could do it in a development environment but like I said not knowing the nature of the cause of the "leak" we have been unable to reproduce it in such an environment. Also since the load is considerably less in the development envronment I am unclear how long the profiler should run before adequate information is available.

    Kevin

    CLRProfiler cannot attach to a process, that much is true. But if your application allocates large chunks of memory in a quick succession and the GC is permanently busy moving around those objects after they survived garbage collection (to compact the heap) or holds on to memory for too long, this would show up in your test environment too.

    What matters is the progression and the pattern, not the quantity. You will need to look for objects that *constantly* survive garbage collection (histogram by age), and you also can verify if improper use of finalizers leads to large delays in GC, etc.

    The CLRProfiler is not a monolithic tool. It gives you great control over time resolution when displaying the graphs, you have the option to turn allocation and call logging on/off, you can do a heap dump at different moments in time, on your own choosing. And you can even have your application trigger memory dumps in code.

    I would not recommend to use CLR Profiler in a production environment. As Peter Sollich put it: "CLRProfiler is an intrusive tool; seeing a 10 to 100x slowdown in the application being profiled is not unusual". But having said this, I don't know of a better tool when it comes to diagnosing GC related issues.

    The GC is optimized for collection of small objects that have a very restricted lifetime. Therefor, my suggestion is to look for everything that holds on memory for too long. You don't need large data to do this, you only need to identify the pattern. And, yes, this is not a science, no point and shoot. It's a recursive process involving some "trial and error" while focusing down on the real problem.

    Marcel
    Tuesday, December 8, 2009 12:01 AM
  • Thank you for your interest.

    Unfortunatelly I must need a tutorial on the CLRProfiler. Once I have it running in development I don't see any options but to 'Show heap now'. When that is activated I get a page that shows some kind of allocation graph extending off on the right for quite a while. I am not sure how to use this to find objects that continually survive GC. I could continually hit 'Show heap now' and try to decipher differences but the growth in memory usage even under load in production takes almost 24 hours and who knows how long it will take in a low load environment like development.

    Also before I go headlong into CLR debugging I need to identify whether it is a CLR problem. I have put counters in my WCF service so I know how many times a particuliar message was called. So I take this information back to the development environment and try to call say the top 10 messages repeatedly in a loop to see if I can get the memory usage to grow (reproduce the problem) but the memory usages doesn't increase. Like I said I have yet to find the conditions that would reproduce the problem in a devlopment environment.

    Thanks again

    Kevin
    Tuesday, December 8, 2009 3:05 PM
  • Unfortunatelly I need a tutorial on the CLRProfiler.

    Kevin, you can find a CLRProfiler.doc with the distribution files. There is also a small video file, where Peter Sollich gives a short but very informative introduction to the tool.

    http://www.microsoft.com/downloads/details.aspx?familyid=A362781C-3870-43BE-8926-862B40AA0CD0&displaylang=en
    http://msdn.microsoft.com/en-us/library/ms979205.aspx

    Marcel
    • Marked as answer by KevinBurton Tuesday, December 8, 2009 3:20 PM
    Tuesday, December 8, 2009 3:17 PM
  • The managed heap is not prone to fragmentation since the garbage collector is free to move stuff around, if you want to know the source of your memory try using a profiler like redgate ants or scitech's memory profiler

    The large object heap does not do this, and can be fragmented.
      http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/
    A while ago I found some sample code that forces this to happen by alternately allocating and deallocating objects of the smallest and largest size possible.  It was a contrived example, and highly unlikely, but this is the symptom that one would see: An app using more and more memory even though the CLR objects are freed.
    Tuesday, December 8, 2009 9:08 PM