none
Entity Framework: [PERF ISSUE] How to have the DB server execute queries on collections? RRS feed

  • Question

  • Dear all,

     

    We are currently developping a project on top of Entity Framework and we are facing a performance issue when querying large collections.

    In short, if we write a query on a collection that maps a relationship between 2 tables, even if that query is designed to return a single instance of an object, the whole collection is loaded into the application memory; this is an issue when the collection has a large number of instances.

    Here is a repro scenario using AdventureWorks:

     

        public static SalesOrderDetail GetDetails(SalesOrderHeader order, int orderDetailsId)
        {
          return order.SalesOrderDetails.Single
            (thisOrderDetail => thisOrderDetail.SalesOrderDetailID == orderDetailsId);
        }
    
    


    This simple snippet returns a row in a sales order, knowing its header and its id.

    When running SQL Server Profiler, it's easy to see that ALL rows related to the sales order header are retrieved:

    exec sp_executesql N'SELECT 
    [Extent1].[SalesOrderID] AS [SalesOrderID], 
    [Extent1].[SalesOrderDetailID] AS [SalesOrderDetailID], 
    [Extent1].[CarrierTrackingNumber] AS [CarrierTrackingNumber], 
    [Extent1].[OrderQty] AS [OrderQty], 
    [Extent1].[ProductID] AS [ProductID], 
    [Extent1].[SpecialOfferID] AS [SpecialOfferID], 
    [Extent1].[UnitPrice] AS [UnitPrice], 
    [Extent1].[UnitPriceDiscount] AS [UnitPriceDiscount], 
    [Extent1].[LineTotal] AS [LineTotal], 
    [Extent1].[rowguid] AS [rowguid], 
    [Extent1].[ModifiedDate] AS [ModifiedDate]
    FROM [Sales].[SalesOrderDetail] AS [Extent1]
    WHERE [Extent1].[SalesOrderID] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=51721
    


    Now this may be considered ok here because the biggest sales order has 72 rows in AdventureWorks.

    But we have similar queries where the number of rows is in the 50,000  to 300,000 range, so this behaviour is a real issue for us.

    Is there a way to have that query fully executed on the DB Server side, that is to have a WHERE clause that would be something like the following, without breaking our programming model?

    WHERE ([Extent1].[SalesOrderID] = @p__linq__0) AND ([Extent1].[SalesOrderDetailID] = @p__linq__1)',N'@p__linq__0 int,@p__linq__1 int',@p__linq__0=51721,@p__linq__1=39800
    
    Any idea or suggestion welcomed. Thanks in advance.

    Monday, June 27, 2011 10:08 AM

All replies

  • Hello,

    SalesOrderDetails is navigation property on SalesOrder, isn't it? In such case it does exactly what you told it to do. It is collection and executes linq-to-objects but you are most probably using POCO entities and lazy loading is triggered first time you access the collection - lazy loading always loads the whole collection. If you want to get only single instance detail you should load it from context:

        context.SalesOrderDetails.Where(sod => sod.SalesOrderDetailId == orderDetaisId);

    Best regards,
    Ladislav

    Monday, June 27, 2011 11:29 AM
  • Ladislav, thank you for your answer.

    So if I understand correctly, when I am in a method that doesn't have access to the context, but only has access to the POCO object (SalesOrderHeader order in our example), I can only have the whole collection of related objects (SalesOrderDetails) loaded and execute linq-to-object queries on it?

    If this is correct, then this is a perf killing scenario when the collection is large, right?

    Or is there another way (apart from using the context)?

    I have read quite a few papers on EF performance but never found anything on that subject.

    Best regards,

     

    MR

    Monday, June 27, 2011 2:16 PM
  • On your original post, it returned your result after executed Single method. You found sql fragement like as follows.


    SELECT columns... FROM  [dbo].[Parent_Table] AS [Extent1]
    INNER JOIN [dbo].[Child_Table] AS [Extent2] ON [Extent1].[ID] = [Extent2].[ID]
    WHERE 2 = [Extent2].[ID]

    Lazy loading, related entities are automatically loaded from the data source when you accessed. Please check this link for more informaton. http://msdn.microsoft.com/en-us/library/dd456855.aspx

    Hope this helps. If I misunderstood, please feel free to let me know.

    Best Regards,


    Larcolais Gong[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, June 28, 2011 4:37 PM
  • Larcolais,

     

    I posted the actual SQL fragment, note it only uses child table, obviously because in my scenario, parent table has been loaded before.

    However, this is not the issue.

    The issue is that a query on lazy loaded, related entities relies on LINQ to object, which by far consumes more resources and is less efficient than SQL Server when the collection of related entities happen to be "large".

    So my question is: in this scenario, is there a way to force EF to query large collections of related entities directly against the data store, and avoid the under-performer LINQ to object?

    Right now, I have a workaround as follow:

        public static SalesOrderDetail GetDetails2(SalesOrderHeader order, int orderDetailsId)
        {
          using (var context = new AdventureWorksEntities())
          {
            int orderId = order.SalesOrderID;
            SalesOrderDetail result = context.SalesOrderDetails.Single
              (thisOrderDetail => thisOrderDetail.SalesOrderID == orderId
                && thisOrderDetail.SalesOrderDetailID == orderDetailsId);
            context.Detach(result);
            return result;
          }
        }
    



    Obviously, the context uses SQL Server much more efficiently:

     

    exec sp_executesql N'SELECT TOP (2) 
    [Extent1].[SalesOrderID] AS [SalesOrderID], 
    [Extent1].[SalesOrderDetailID] AS [SalesOrderDetailID], 
    [Extent1].[CarrierTrackingNumber] AS [CarrierTrackingNumber], 
    [Extent1].[OrderQty] AS [OrderQty], 
    [Extent1].[ProductID] AS [ProductID], 
    [Extent1].[SpecialOfferID] AS [SpecialOfferID], 
    [Extent1].[UnitPrice] AS [UnitPrice], 
    [Extent1].[UnitPriceDiscount] AS [UnitPriceDiscount], 
    [Extent1].[LineTotal] AS [LineTotal], 
    [Extent1].[rowguid] AS [rowguid], 
    [Extent1].[ModifiedDate] AS [ModifiedDate]
    FROM [Sales].[SalesOrderDetail] AS [Extent1]
    WHERE ([Extent1].[SalesOrderID] = @p__linq__0) AND ([Extent1].[SalesOrderDetailID] = @p__linq__1)',N'@p__linq__0 int,@p__linq__1 int',@p__linq__0=51721,@p__linq__1=39800
    

     


    (quite funny that a "Single" generates a "SELECT TOP(2)" but... that's much, much better anyway).

    Now the concern with this workaround is that it involves the creation of a 2nd context, which can be resources consuming if used often and/or if many concurrent users are involved.

    Also, result doesn't come from the same context, and because the 2nd context is dropped, result cease to be queryable.

    So what would be Microsoft recommendation for querying large collections in the POCO / Lazy loading context?

    Thanks in advance for your answer,

     

    MR

    Wednesday, June 29, 2011 8:22 AM
  • Hello,

    Frankly speaking, EF is capable of handling large entity data models but you can run into performance problems if  the data model is very inter-connected. Here's the link which talked about some possible solutions about this problem. I suggest you can check it. http://blogs.msdn.com/b/adonet/archive/2008/11/24/working-with-large-models-in-entity-framework-part-1.aspx

    http://social.msdn.microsoft.com/Forums/pl-PL/adonetefx/thread/0a59f233-a3d8-4813-ba01-a095648f7192\

    Best Regards,


    Larcolais Gong[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, June 30, 2011 8:41 AM
  • Any update? Would you mind letting us know how it goes?

    Please feel free to let us know if you need further support.

    Best Regards,


    Larcolais Gong[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, July 8, 2011 4:53 PM
  • Hi Larcolais,

     

    I appreciate your frankly speaking - it's not that often that a Microsoft representative acknowledges some weaknesses in a Microsoft product, hope you will get a good credit for it.

    Actually perf issues can happen starting with a single interconnection between two entities, providing there is a large number of occurences on one side of the relation for only one occurence on the other side of the relation.

    But I agree, a highly interconnected model can be a very serious issue with EF...

    The difference in perfs is so important between SQL Server and LINQ to Objects, when querying high volumes, that developers and architects who are implementing solutions on top of EF and target high volumes really need to have a good understanding of what happens in the back end data store and what happens in LINQ to Objects.

    Based on the information from this thread, we implemented our own workarounds - our rule of thumb is: "if you need to retrieve a few occurences from a potentially large collection, consider querying directly the context rather that querying the property that holds the collection, and avoid the cost of the load of the collection. Downside is, this implies the use of a second context, and the retrieved objects won't be connected to their logical owner in the original context. But if you can live with it, your perfs will probably improve."

    Saturday, July 9, 2011 8:44 PM