none
Memory leak in LINQ RRS feed

  • Question

  • I have some code that I am starting to suspect that LINQ has a memory leak. The application runs for over 12 hours and over that period the memory consumption slowly increases and finally the system is consumed with servicing memory faults. I am on Windows 7 running Visual Studio 2008. The code looks like:

            public static IEnumerable<List<OrderHistoryDetail>> IterateOrderHistory(DateTime start, DateTime end, TextWriter log)
            {
                Dictionary<int, List<ComponentDetail>> kitDictionary = Utilities.GetKitDictionary();
                using (OrderClassesDataContext orderContext = new OrderClassesDataContext())
                {
                    orderContext.Connection.Open();
                    orderContext.ExecuteCommand("SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;");
                    orderContext.CommandTimeout = 3000;
                    var items = from o in orderContext.OrderHeaders
                                join oi in orderContext.OrderItems on o.OrderNumber equals oi.OrderNumber
                                where o.OrderDateCST > start && o.OrderDateCST <= end
                                select new { oi.Sku };
                    foreach (var sku in items.Distinct())
                    {
    . . . .
                                yield return ListOrders(orderContext, start, end, sku.Sku, log);
    . . . . .
    
            private static List<OrderHistoryDetail> ListOrders(OrderClassesDataContext orderContext, DateTime start, DateTime end, string skuValue, string componentValue, int componentQuantity, TextWriter log)
            {
                var orders = from o in orderContext.OrderHeaders
                             join oi in orderContext.OrderItems on o.OrderNumber equals oi.OrderNumber
                             where o.OrderDateCST > start && o.OrderDateCST <= end &&
                                   (oi.Sku == skuValue || oi.Sku == componentValue)
                             orderby o.OrderDateCST, o.OrderNumber
                             select new { o.OrderDateCST, o.OrderDateCST.Value.Year, o.OrderDateCST.Value.Month, o.OrderDateCST.Value.Day, o.OrderDateCST.Value.DayOfYear, o.OrderNumber, o.OrderGroupID, oi.Prodid, oi.Sku, oi.Qty, ExtendedPrice = (oi.ExtendedPrice <= 0) ? oi.UnitPrice * oi.Qty : oi.ExtendedPrice };
                int orderCount = 0;
                string lastOrderNumber = string.Empty;
                string orderNumber = string.Empty;
                List<OrderHistoryDetail> orderHistoryList = new List<OrderHistoryDetail>();
                foreach (var order in orders)
                {
    . . . . .
                    orderHistoryList.Add(item);
    . . . .
                return orderHistoryList;
    

    So basically the code finds a distinct list of SKU's (there are about 40,000 of them) and then interating through this list and using the same context it forms a list of orders involving each of the SKUs. Right now the application just prints the results out but eventually there will be some processing invoved. After 12 hours the app is unusable because of this memory leak. So I wanted to eliminate the possibility that it is LINQ related first then I will see if there is somthing in my code that could be causing this. The only thing that I can think of off the top of my head is that forming the same query (only varying the 'where' clause) over and over 40,000 times might cause some problems (similar to the serialization assembly that gets loaded each time it is constructed).

    Any ideas?

    Kevin
    Tuesday, January 12, 2010 2:57 PM

Answers

  • I'm fairly new to LinQ myself, so I'm not sure, but I seem to recall reading about people having problems trying to use two instances of the same data context at once.  You could try it, but if that doesn't work, I would advise calling ToList() on your initial query so you have your local (in-memory) list of skus, then disposing the initial DataContext, and create a new one for each iteration through your foreach loop.  That's pretty much the approach I took.

    Jesse Kindwall
    • Marked as answer by KevinBurton Tuesday, January 12, 2010 9:40 PM
    Tuesday, January 12, 2010 4:24 PM

All replies

  • You do realize that Linq has been out for awhile and is currently being tested under .Net 4 Beta. Any such leak, AFAIK, would most likely have been found due to such a high usage in the community. The real question is most likely what is holding onto the references or not unsubscribing to events (an unsubscribed event pins an object to memory even if the user has unreferenced it). I recommend that you look at your objects and verify that they all implement IDispose properly and unref and unsubscribe to all events. This memory leak can be looked at by monitoring Private Bytes in perfmon. Did you look at that?

    See this article Identify And Prevent Memory Leaks In Managed Code to begin that process.

    One other item to look for is on the Processes tab of the Windows Task Manager. ( View + Select Columns,) check Handles, GDI Objects and USER Objects. Observe these values for your program. If there is t a handle leak, you'll see one of these steadily climbing if you do. GDI in all likelihood under those scenarios.



    More articles of interest:

    CLR Inside Out: Investigating Memory Issues -- MSDN Magazine, November 2006
    Debug Leaky Apps: Identify And Prevent Memory Leaks In Managed Code -- MSDN Magazine, January 2007
    Download details: Debug Diagnostic Tool v1.1
    Joe Duffy's Weblog (Dispose, Finalization, and Resource Management)

    HTH GL


    William Wegerson (www.OmegaCoder.Com)
    Tuesday, January 12, 2010 3:09 PM
    Moderator
  • Thank you for your tips. I will try to see if these shed anymore light on the problem.

    I understand that LINQ has been out for a while. But I was hoping for maybe some usage guidance. Like with serialization I was unaware that an assembly was loaded into my AppDomain each time I do a new XmlSerializer(...) and I had an app that showed all the signs of a memory leak and it turns out that the number of modules that was loaded into the AppDomain was steadily rising and it was not a "memory leak". I just had to cache the serializer. It is something like that that I was thinking may be at the root of my problem. Maybe LINQ generates a new SQL query that doesn't get disposed when the context is not disposed or something like that. I am grasping at straws here.

    Thanks again.

    Kevin
    Tuesday, January 12, 2010 3:23 PM
  • The other option is to mimic what you are doing in a clean project. Something that you can reproduce and show to the community. Working on such an example might shed light on either the issue you face or the problem inherent in your objects. HTH
    William Wegerson (www.OmegaCoder.Com)
    Tuesday, January 12, 2010 3:39 PM
    Moderator
  • I recently had similar problems with a utility I'm developing that runs for several hours making modifications to a large database.  After a few hours it slowed to a crawl and eventually crashed with an OutOfMemoryException.

    Basically, I was creating one DataContext and using it for all the db operations I needed to perform for the entire time the app was running.  Turns out this is not a good idea, as the DataContext class is designed to hang on to refrences to all the data you ever touch with it.

    I can't really tell from the small snippet of code you posted if your application is doing the same thing.  But if it is, you'll want to try to break it up into separate tasks and have each task create and dispose of its own DataContext.  In particular, if like my app, yours is spending the bulk of its time in one loop doing a few things over and over again, you'll want to make sure the DataContexts you're using are created and disposed within the loop, so they never persist beyond a single iteration.  This, among a few other tweaks solved my memory issues.

    Jesse Kindwall
    Tuesday, January 12, 2010 3:53 PM
  • Thank you this is what I suspected. If you can tell from the above code I create one DataContext. This is used to query the DB for a list (a query sequence) which is about 40,000 items long. Then using the same DataContext I query the database for a list based on each of the 40,000 items. So I am essentially reusing the same DataContext 40,000 times. So it seems this is exactly what I am doing wrong. I will adjust the code and see what happens. By the way if I create (and dispose) the same DataContext would that work? In other words if I change the code something like:

                using (OrderClassesDataContext orderContext1 = new OrderClassesDataContext())
                {
                      var query1 = . . . . .
                     using (OrderClassesDataContext orderContext2 = new OrderClassesDataContext())
                     {
                         var query2 = . . . .
                     }
                     
                }
    Do you see that working? There isn't any caching involved so that LINQ tries to be too smart and notices that orderrContext2 is the same as orderContext1 and caches it?

    Thanks again.

    Kevin
    Tuesday, January 12, 2010 4:11 PM
  • I'm fairly new to LinQ myself, so I'm not sure, but I seem to recall reading about people having problems trying to use two instances of the same data context at once.  You could try it, but if that doesn't work, I would advise calling ToList() on your initial query so you have your local (in-memory) list of skus, then disposing the initial DataContext, and create a new one for each iteration through your foreach loop.  That's pretty much the approach I took.

    Jesse Kindwall
    • Marked as answer by KevinBurton Tuesday, January 12, 2010 9:40 PM
    Tuesday, January 12, 2010 4:24 PM
  • I think you should use the tools at your disposal to determine what the memory leak is before trying to optimize code that may not matter. Here are a few tools to get your started:

    1. VMMap - http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx - Take a look at the application in VMMap when the memory usage is very high to get a feeling of where all the memory is being consumed. Is it the managed heap or the normal heap? If it's the managed heap there are some other tools that can help.
    2. .NET Memory Profiler - http://www.memprofiler.com/ - Try downloading a trial version of this to monitor where your memory is being utilized. This should really help you figure out what's causing the problems in your app.

    One idea is that the number of objects you're querying might be filling the LINQ to SQL identity cache to capacity, but it would be better if you could figure out exactly where the problem is.

    Hope that helps,
    David

    Blog - http://blogs.rev-net.com/ddewinter/ Twitter - @ddewinter
    Tuesday, January 12, 2010 9:21 PM
    Answerer
  • Thank you for the tips. It turns out I had the same problem that another poster had in that the DataContext was holding on to a reference to all queries and that seems to be the source of the leak.

    Kevin
    Tuesday, January 12, 2010 9:39 PM