none
ELinqQueryState.GetExecutionPlan() major limitation for non trivial query performance (EF4) RRS feed

  • Question

  • We have major performance issues with EF query plan compilation performance for larger eager load queries involving more than a handful includes.

    In some cases these queries involve some mild inheritance between a few of the persistent entities in the queries.

    For a couple of days we have profiled these queries and experimented with various ways to break up larger eager loads into multiple smaller and more simplistic eager loads that EF can deal with more easily, but it is still not nearly enough. Almost all server time is spent executing ELinqQueryState.GetExecutionPlan().

    A typical scenario is a read query that returns one entity instance with a related object graph that involves many objects. Multiple 1:1 or 1:M associations between this root object and related objects, etc. Match critera is typically a single field == searchstring on one of the entities in the wanted graph.

    Here are some numbers.

    1) With a few major eager load queries, before splitting up

    159 s to search for one object and related graph. Basically all of that is ELinqQueryState.GetExecutionPlan()

    2) After splitting up these few queries into many more simplistic queries, to help ease the burden on EF, we get this

    8,5 s to search for the same object and related graph. Time outside ELinqQueryState.GetExecutionPlan() is still only a small fraction of that time

    Our target end-to-end seach time would be something like maybe half a second or so (end user search in GUI).

    Note1: In either of these scenarios, actually executing the SQL statements created by ELinqQueryState.GetExecutionPlan() just takes fractions of a second, so any actual database query related work/optimization techniques such as indexes/indexed views are basically irrelevant for our current situation.

    Note2: This is not even our most resource consuming query for seaching a what should be a not super complex object query, we have queries that are still worse than this even after breaking them up. We have another search scenario where we currently perform at about 84 s to query for one object + larger related graph by one key column, and that's after splitting up a few larger eager loads into multiple more simple eager loads. Of those 83 s, vast majority of that time is spent in ELinqQueryState.GetExecutionPlan().

    End user expectaions are fraction of a second for this search as well (search for one grouped customer case by reference number). Btw, the current mainframe application that is 25 years old which we are replacing displays this information basically instantaneously

    EF June 2011 CTP

    When we try our code on the current EF CTP from June 2011 we get query reduction of about half the total end-to-end time. However, performance is still not nearly where we want to be. It seems that this CTP can execute the exact same repeat query without re-creating the query plan (very fast = GOOD, end-to-end search time ~400 ms), however this is still intermittent and also does not seem to recognize the same query with different parameters applied (?), which still takes ages. Large disclaimer: We are not sure if this is accurate though as we just started experimenting the last day or so with this so this is very much inconclusive on our end.

    We have also tried using some compiled queries, but there are some limitations in how these can be used + the coupling and maintenence price of those are not minor either, data model coupling into service operation model coupling is bad enough as it is with all the include stuff. Because of the number of queries and eager load graph complexities of our queries (and how we expect these to vary over time going forward) we would prefer to go down this road only if we have to. Compiled queries also seem to be slow when they contain more than a few includes, so we have actually experimented with removing them in favor of smaller non compiled queries. Still work in progress here and optimization results are largely inconclusive at this point, but we realize that we need to learn more so therefore this post.

    I hope somebody can help us on this, we have the following questions

    a) What are our options to get improved performance? Right now this is a major performance issue for us and unless resolved will threaten our project to be cancelled or delayed. We might even have to consider to throw out EF completely. Perhaps there is a way to reduce our dependency of GetExecutionPlan()?

    b) When will an EF version be released that improves EF query plan performance, including actually remembering the super expensive query plans between calls (including parameters in that query plan)?

    EF (4.0) seems to re-calculate out the query plan every time the same logical query is executed which is very inefficient for us.

    Will EF 5.0/NET FW 4.5 resolve this issue, and if so when can we expect this to be released? The answer to this question will greately impact us, since we have a target release date for our project a few months from now and need to make some choices very soon to have a chance to succeed.

    I can't help but feel that this is old school stuff that at least to me looking at it from the outside looks very similar to query plan caching in a relational database engine. Why has this not been resolved in almost two years since EF4 was released? Perhaps this is a very hard problem to solve? If so, pls let us know. I feel as our project can't be the only one with a non simplisitic data model and non trivial query needs. ORM/DDD is all about using a domain model to manage complexity. Non trivial querying for those objects is not optional, so a better solution can't be either.


    Tech stuff

    - .NET Framework 4.0 on Windows Server 2008 R2

    - WCF Service application, hosted in IIS7

    - We currently use EF 4.0 (have experimented with current june CTP, see above)

    - SQL Server 2008 R2

    - Model first, self tracking POCO entities, no lazy loading

    - DDL Generation Template is originally SSDLToSQL10.tt, which is a Table Per Type schema  strategy

    - Some minor inheritance for persistent entities exist in our data model, but we've keept these to an absolute minimum since we know (and have experienced) that there are major perf issues with these as well

    More details available on request, just ask

    PS: I work for a consultancy company that is a Ms gold partner. If there is a way to get some official support on this I would appreciate if somebody please point me in the right direction.

    Thanks

    Thursday, February 16, 2012 2:09 PM

Answers

  • We've now spent two full weeks profiling this and improving performance. We have decided to convert basically all queries to compiled queries.

    Current numbers:

    All end-to-end searches are below 100 ms.

    One very big gotcha was how fragile compiled queries actually are in terms of how they are used. We had several cases were we used them in ways that made complete sense but very very subtle things ended up still yielding zero benefits, because queries where still being forced to recompile.

    For instance, given this definition of a compiled query

            private static Func<matrixobjectcontext, iqueryable<vehiclehullcontinuity="" long,="">> vehicleHullForCustomerCaseId;
            public IQueryable<vehiclehullcontinuity> FindVehicleHullForCustomerCaseId(IObjectContext context, long customerCaseId)
            {
                Guard.Argument.AssertNotNullWhenAssignedToType<matrixobjectcontext>(context, "context");
                var concreteContext = (MatrixObjectContext)context; // now confirmed safe cast (not null and assignable)
    
                if (vehicleHullForCustomerCaseId == null)
                {
                    CompileVehicleHullForCustomerCaseId();
                }
    
                return vehicleHullForCustomerCaseId.Invoke(concreteContext, customerCaseId);
            }
    
    
            private void CompileVehicleHullForCustomerCaseId()
            {
                vehicleHullForCustomerCaseId =
                    CompiledQuery.Compile<matrixobjectcontext, iqueryable<vehiclehullcontinuity="" long,="">>
                        ((ctx, customerCaseId) =>
                         ctx.VehicleHullContinuitySet
                            .Include("VehicleHullVersion")
                            .Where(u => u.PremiumPeriodContinuity.CustomerCaseContinuity.Id == customerCaseId));
            }</matrixobjectcontext,></matrixobjectcontext></vehiclehullcontinuity></matrixobjectcontext,>

    Given the above definition of the compiled query method FindVehicleHullForCustomerCaseId, consider the two ways to call that method below:

    var result = compiledVehicleHullQueries.FindVehicleHullForCustomerCaseId(this.Context, customerCaseId).FirstOrDefault();
    var result = new List<vehiclehullcontinuity>(compiledVehicleHullQueries.FindVehicleHullForCustomerCaseId(this.Context, customerCaseId)).FirstOrDefault();</vehiclehullcontinuity>

    The first one will force a recompile of the query every time, because just fetching one row might alter the SQL query used ('select top 1 ... from' instead of 'select ... from'). The second one will work.

    On the surface in terms of using queries as a reasonable abstraction this is very far from intuitive, only when you know a lot more about how EF actually works this makes sense.

    Things like these took a pretty long time to figure out even with very experienced people (both DB/.NET), even after we had read blog posts about query shape and compiled queries.

    Recommendations

    We suggest that anybody with a similar situation as ours to use compiled queries for absolutely everything, not doing so is out of the question if you want any type of acceptable performance.

    Compiled queries must be executed EXACTLY as defined. You must do very detailed verification that this is the case. ESPECIALLY as queries are reshaped and how objects are used are modified over time. We would recommend that you write automated performance end-to-end tests that catch very slight changes to query shape, since it is very very easy to use EF queries in a way that destroys performance completely. In a large team environment like ours this is a necessity until EF 5 comes out.





    • Marked as answer by M. Hol Thursday, February 23, 2012 10:09 AM
    • Edited by M. Hol Thursday, February 23, 2012 10:12 AM
    Thursday, February 23, 2012 10:06 AM

All replies

  • I've kept digging and found this blog post from 2008 about the design decision not to implement a query plan cache in EF. The post is about LINQ to SQL, but I suppose it holds for LINQ to entities as well.

    http://blogs.msdn.com/b/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx

    This was the section that caught my eye:

    "Now there was some debate about how to make compiled queries durable, automatically caching them was considered, but this was something I was strongly against.  Largely because of the object lifetime issues it would cause.  First, you would have to do complicated matching of a created query against something that was already in the cache -- something I'd like to avoid.  Secondly you have to decide where to store the cache, if you associate it with the DataContext then you get much less query re-use because you only get a benefit if you run the same query twice in the same data context.  To get the most benefit you want to be able to re-use the query across DataContexts.  But then, do you make the cache global?  If you do you have threading issues accessing it, and you have the terrible problem that you don't know when is a good time to discard items from the cache.  Ultimately this was my strongest point, at the Linq data level we do not know enough about the query patterns to choose a good caching policy, and, as I've written many times before, when it comes to caching good policy is crucial.  In fact, analogously, we had to make changes in the regular expression caching system back in Whidbey precisely because we were seeing cases where our caching assumptions were resulting in catastrophically bad performance (Mid Life Crisis due to retained compiled regular expressions in our cache) --  I didn't want to make that mistake again."

    Are these arguments still in play on the EF team? From the perspective I'm sitting at right now performance is pretty much a huge issue already. At this point I would gladly pay several hundred megabytes of memory to hold the data or more if neccessary + the minor perf cost to make such a global cache thread safe. That would be a bargain to me.

    To be clear, I would gladly perfer this to the maintenance cost of maintaining hundreds of rigidly coupled compiled queries of non trivial complexity due to change in a system that very well might have to have to be maintained over a decade, or more - even if I had the performance I needed only by using compiled queries as is. In our case, that cost would be a lot higher for instance in terms of end-to-end lead times etc, and also more of a technical pain due to the various compiled query quirks.

    Perhaps the cacheing policy can be made configurable and/or conservative in such a way that it does not starve out server resources and/or a certain policy can be deprecated and replaced, should issues be found with it after release. Though, I realize I might be naive at that since I dont have the experience the EF team have so I might be underestimating the problem.

    But perhaps this is exactly what the CTP is trying to accomplish with the new cacheing features...




    • Edited by M. Hol Friday, February 17, 2012 3:13 PM
    Friday, February 17, 2012 3:12 PM
  • And this just 3 days old, looks promising
    http://blogs.msdn.com/b/adonet/archive/2012/02/14/sneak-preview-entity-framework-5-0-performance-improvements.aspx

    "As a result in one of our tests, repeat execution time of the same LINQ query has been reduced by around 6x. We also have an end-to-end application that we use for performance testing that is running 67% faster."

    Friday, February 17, 2012 4:19 PM
  • Hi M. Hol,

    Welcome to MSDN Forum.

    From ado.net team blog, I find some paragraphs that describe the performace improvement in EF5.

    "Today, using EF 4, every time you execute a LINQ to Entities query the LINQ expression has to be compiled into an intermediate representation which can be then used by EF to talk to any provider. The compilation process happens every time one executes a LINQ query.

    In EF 5, the first time you execute a LINQ to Entities query EF will take the LINQ expression and compile it just like it did in EF 4. It will then store the compiled query into an internal query cache. The compilation time remains approximately the same, but can be faster in EF 5 for certain scenarios. The biggest benefit will come the second time you execute the same query since EF will now skip compilation completely and just use the compiled query it has cached."

    Further more, as you have mentioned in the last post,

    "As a result in one of our tests, repeat execution time of the same LINQ query has been reduced by around 6x. We also have an end-to-end application that we use for performance testing that is running 67% faster."

    It sounds really cool and let us expect EF5.0 release soon.

    Best Regards


    Allen Li [MSFT]
    MSDN Community Support | Feedback to us

    • Marked as answer by Allen_MSDNModerator Thursday, February 23, 2012 1:00 AM
    • Unmarked as answer by M. Hol Thursday, February 23, 2012 8:27 AM
    Monday, February 20, 2012 3:25 AM
    Moderator
  • We've now spent two full weeks profiling this and improving performance. We have decided to convert basically all queries to compiled queries.

    Current numbers:

    All end-to-end searches are below 100 ms.

    One very big gotcha was how fragile compiled queries actually are in terms of how they are used. We had several cases were we used them in ways that made complete sense but very very subtle things ended up still yielding zero benefits, because queries where still being forced to recompile.

    For instance, given this definition of a compiled query

            private static Func<matrixobjectcontext, iqueryable<vehiclehullcontinuity="" long,="">> vehicleHullForCustomerCaseId;
            public IQueryable<vehiclehullcontinuity> FindVehicleHullForCustomerCaseId(IObjectContext context, long customerCaseId)
            {
                Guard.Argument.AssertNotNullWhenAssignedToType<matrixobjectcontext>(context, "context");
                var concreteContext = (MatrixObjectContext)context; // now confirmed safe cast (not null and assignable)
    
                if (vehicleHullForCustomerCaseId == null)
                {
                    CompileVehicleHullForCustomerCaseId();
                }
    
                return vehicleHullForCustomerCaseId.Invoke(concreteContext, customerCaseId);
            }
    
    
            private void CompileVehicleHullForCustomerCaseId()
            {
                vehicleHullForCustomerCaseId =
                    CompiledQuery.Compile<matrixobjectcontext, iqueryable<vehiclehullcontinuity="" long,="">>
                        ((ctx, customerCaseId) =>
                         ctx.VehicleHullContinuitySet
                            .Include("VehicleHullVersion")
                            .Where(u => u.PremiumPeriodContinuity.CustomerCaseContinuity.Id == customerCaseId));
            }</matrixobjectcontext,></matrixobjectcontext></vehiclehullcontinuity></matrixobjectcontext,>

    Given the above definition of the compiled query method FindVehicleHullForCustomerCaseId, consider the two ways to call that method below:

    var result = compiledVehicleHullQueries.FindVehicleHullForCustomerCaseId(this.Context, customerCaseId).FirstOrDefault();
    var result = new List<vehiclehullcontinuity>(compiledVehicleHullQueries.FindVehicleHullForCustomerCaseId(this.Context, customerCaseId)).FirstOrDefault();</vehiclehullcontinuity>

    The first one will force a recompile of the query every time, because just fetching one row might alter the SQL query used ('select top 1 ... from' instead of 'select ... from'). The second one will work.

    On the surface in terms of using queries as a reasonable abstraction this is very far from intuitive, only when you know a lot more about how EF actually works this makes sense.

    Things like these took a pretty long time to figure out even with very experienced people (both DB/.NET), even after we had read blog posts about query shape and compiled queries.

    Recommendations

    We suggest that anybody with a similar situation as ours to use compiled queries for absolutely everything, not doing so is out of the question if you want any type of acceptable performance.

    Compiled queries must be executed EXACTLY as defined. You must do very detailed verification that this is the case. ESPECIALLY as queries are reshaped and how objects are used are modified over time. We would recommend that you write automated performance end-to-end tests that catch very slight changes to query shape, since it is very very easy to use EF queries in a way that destroys performance completely. In a large team environment like ours this is a necessity until EF 5 comes out.





    • Marked as answer by M. Hol Thursday, February 23, 2012 10:09 AM
    • Edited by M. Hol Thursday, February 23, 2012 10:12 AM
    Thursday, February 23, 2012 10:06 AM
  • Pre-compiling is more expensive than just regular compiling. So if you are looking for even the teeniest perf improvements you can find, then for queries that won't be used more than once in an application process, you're better off not using pre-compiled queries. Also, if you are pre-generating views, then simple queries will take advantage of the pre-generated code.

    Julie Lerman, Author of Programming Entity Framework, MVP

    Friday, June 29, 2012 6:52 PM
  • Thanks Julie, btw this is the Stockholm project you visited when you where here. Cheers :)

    ps. Because of the cost of converting/maintaining we only convert non trivial queries where it pays off business wise. We're "only" at 125+ compiled queries atm, and need a lot more.

    Monday, July 2, 2012 8:08 AM
  • "125+ compiled queries" ..that's *crazy* talk!Too much work. Is there any way you can move to .NET 4.5 ( EF5 (VS2012))? 

    As Allen quotes from the EF team blog, .NET 4.5 does indeed solve the problem. EF has query compilation built in. And you don't even need to change your project. As long as the end user has .NET 4.5 installed, it will do the auto-compiling, even if the project targets .NET 4.

    I think any pre-compiled queries will just continue to do what they do, but any that aren't will get auto-compiled. It will cache 800 queries and then after that it will start looking for ones that haven't been used in a while and eject them.

    It is SO much better than having to explicitly create pre-compiled queries. SO SO SO much better!!

    (Company that took me & my friend kayaking & picnic? Fond memories!!)


    Julie Lerman, Author of Programming Entity Framework, MVP

    Monday, July 2, 2012 1:03 PM
  • ...the guy who brought you +1 up for a small debugging session, almost as good as kayaking (Im doing that tonight actually, beautiful summer outside)

    Yea, and we're talking about typically non trivial queries here (4-5 multi-association includes w additional criterias, etc). Its not ideal that's for sure. We're aware that EF5 solves this, and we would very much love to upgrade and use that instead.

    However, until there is an official release date for EF5 we can't start throwing out the explicitly compiled queries. We don't dare rely on a CTP/Beta in the software we release to production. The volume of critical data processed automatically means there is no alternative/manual way to do what the system is designed to do. If there was a serious issue related to EF that we can't fix ourselves and we could not get support we would be stuck. Not even rolling back to the previous system, which is 25y old, would even be possible in that worst case scenario. Heads would roll, brands could take a significant hit and millions would be lost fast.

    To my knowledge EF5 is still in CTP and nothing else has been announced? If anybody knows otherwise I would gladly hear about it.

    Monday, July 2, 2012 1:46 PM
  • We are experiencing same difficulties on our project, but using EF 5.

    For non trivial queries PlanCompiler.Compile() method uses 80.63% inclusive samples on performance monitor, taking more then 10 seconds to execute the query.

    The plan is cached though, the second time it executes in less then a second. The problem is that cached plans get lost after some time, and EF calculates execution plan again for the same query.

    This is our test enviroment:

    - .NET Framework 4.5 on Windows Server 2012

    -ASP MVC application, hosted in IIS8 Express 

    EF 5.0

    - SQL Server 2012 Express

    - Code first, no lazy loading


    Monday, August 12, 2013 1:15 PM
  • Some more information from the same project, a year longer after the last post.

    The major culprit in all this is .Include(). The queries it results in forces both very long compilation times and major pressure on the query optimizer in the database. Both of these will destroy performance in any non-trivial scenario.

    After getting rid of all Includes(), most if not all these issues go away.

    Good practices in writing efficient EF queries is querying with using the following guidelines

    - use only one resulting entity (no includes)

    - To get an object with the associated complete graph of related objects, query multiple times - once for each related object, but constrained by primary key on the ROOT OBJECT.

    For instance, assume you want a specific orders and order details + each item for every order detail as one graph. First query for the order 'where order = orderId', then query for all order details 'where orderDetail.Order.Id = orderId', then query for all items 'where item.orderDetail.Order.Id = orderId'.

    This will result in three queries for the complete graph (which is pretty efficient), and not one query for each combination of order detail and item (which would be terrible).

    HTH


    • Edited by M. Hol Monday, August 19, 2013 9:09 AM
    Monday, August 19, 2013 9:08 AM