locked
Very inefficient SQL generation in EF? RRS feed

  • Question

  • I am trying to optimize a LINQ to Entities query, and I am having a hard time figuring out how. When I experiment with LINQ to SQL, I am able to get amzingly efficient SQL generated. For example, the following LINQ query:

     

    var dpids =
        (
    from cp in CategorizationProducts
        
    join pl in PriceLists on cp.CategorizationID equals pl.CategorizationID
        
    where pl.Active && pl.DealerID == dealerID && cp.CustomCategory.Path == path && cp.CategorizationID == 0
        
    join dp in DealerProducts on cp.Product equals dp.Product
        
    where dp.DealerID == dealerID
        
    select dp.DealerProductID).Union(
        
    from cp in CategorizationProducts
        
    join pl in PriceLists on cp.CategorizationID equals pl.CategorizationID
        
    where pl.Active && pl.DealerID == dealerID && cp.CustomCategory.Path == path && cp.CategorizationID != 0
        
    join dp in DealerProducts on cp.DealerProduct equals dp
        
    where dp.DealerID == dealerID
        
    select dp.DealerProductID
        );

     

    Is turned into this nice, compact, efficient SQL query:

     

    exec sp_executesql N'SELECT [t8].[DealerProductID]
    FROM (
        SELECT [t3].[DealerProductID]
        FROM [CategorizationProducts] AS [t0]
        INNER JOIN [PriceLists] AS [t1] ON [t0].[CategorizationID] = [t1].[CategorizationID]
        LEFT OUTER JOIN [CustomCategories] AS [t2] ON [t2].[CustomCategoryID] = [t0].[CustomCategoryID]
        INNER JOIN [DealerProducts] AS [t3] ON [t0].[ProductID] = [t3].[ProductID]
        WHERE ([t3].[DealerID] = @p0) AND ([t1].[Active] = 1) AND ([t1].[DealerID] = @p1) AND ([t2].[Path] = @p2) AND ([t0].[CategorizationID] = @p3)
        UNION
        SELECT [t7].[DealerProductID]
        FROM [CategorizationProducts] AS [t4]
        INNER JOIN [PriceLists] AS [t5] ON [t4].[CategorizationID] = [t5].[CategorizationID]
        LEFT OUTER JOIN [CustomCategories] AS [t6] ON [t6].[CustomCategoryID] = [t4].[CustomCategoryID]
        INNER JOIN [DealerProducts] AS [t7] ON [t4].[DealerProductID] = [t7].[DealerProductID]
        WHERE ([t7].[DealerID] = @p4) AND ([t5].[Active] = 1) AND ([t5].[DealerID] = @p5) AND ([t6].[Path] = @p6) AND ([t4].[CategorizationID] <> @p7)
        ) AS [t8]',N'@p0 int,@p1 int,@p2 nvarchar(17),@p3 int,@p4 int,@p5 int,@p6 nvarchar(17),@p7 int',@p0=60,@p1=60,@p2=N'\Bar\Bar Supplies',@p3=0,@p4=60,@p5=60,@p6=N'\Bar\Bar Supplies',@p7=0

     

    However, when I do the same query in LINQ to Entities:

     

    var dpids = (from cp in db.CategorizationProducts

    join pl in db.PriceLists on cp.Categorization.CategorizationID equals pl.CategorizationID

    where pl.Active && pl.Dealer.DealerID == dealerID && cp.CustomCategory.Path == path && cp.Categorization.CategorizationID == 0

    join dp2 in db.DealerProducts on cp.Product equals dp2.Product

    where dp2.Dealer.DealerID == dealerID

    select dp2.DealerProductID).Union(

    from cp in db.CategorizationProducts

    join pl in db.PriceLists on cp.Categorization.CategorizationID equals pl.CategorizationID

    where pl.Active && pl.Dealer.DealerID == dealerID && cp.CustomCategory.Path == path && cp.Categorization.CategorizationID != 0

    join dp2 in db.DealerProducts on cp.DealerProduct equals dp2

    where dp2.Dealer.DealerID == dealerID

    select dp2.DealerProductID);

     

    I get this ugly mess of SQL:

     

    exec sp_executesql N'SELECT
    [Distinct1].[C1] AS [C1]
    FROM ( SELECT DISTINCT
     [UnionAll1].[DealerProductID] AS [C1]
     FROM  (SELECT
      [Extent4].[DealerProductID] AS [DealerProductID]
      FROM   (SELECT [Extent1].[CategorizationProductID] AS [CategorizationProductID], [Extent1].[CategorizationID] AS [CategorizationID1], [Extent1].[ProductID] AS [ProductID], [Extent1].[DealerProductID] AS
    [DealerProductID], [Extent1].[DealerSubCategoryID] AS [DealerSubCategoryID], [Extent1].[CustomCategoryID] AS [CustomCategoryID1], [Extent1].[OrderIndex] AS [OrderIndex1], [Extent1].[TrashDate] AS [TrashDate1],
    [Extent1].[CreateDate] AS [CreateDate1], [Extent1].[ModDate] AS [ModDate1], [Extent1].[ModBy] AS [ModBy1], [Extent2].[PriceListID] AS [PriceListID], [Extent2].[PriceListType] AS [PriceListType], [Extent2].[DealerID] AS
    [DealerID], [Extent2].[PublicationsSettingId] AS [PublicationsSettingId], [Extent2].[Name] AS [Name1], [Extent2].[Description] AS [Description1], [Extent2].[StartDate] AS [StartDate], [Extent2].[EndDate] AS [EndDate],
    [Extent2].[Notify] AS [Notify], [Extent2].[NotifyEmailUserID] AS [NotifyEmailUserID], [Extent2].[NotificationEmail] AS [NotificationEmail], [Extent2].[NotificationDays] AS [NotificationDays], [Extent2].[ShowPrice] AS
    [ShowPrice], [Extent2].[ShowPriceListChoiceID] AS [ShowPriceListChoiceID], [Extent2].[Active] AS [Active], [Extent2].[CreateDate] AS [CreateDate2], [Extent2].[ModDate] AS [ModDate2], [Extent2].[ModBy] AS [ModBy2],
    [Extent2].[CategorizationID] AS [CategorizationID2], [Extent3].[CustomCategoryID] AS [CustomCategoryID2], [Extent3].[ParentCustomCategoryID] AS [ParentCustomCategoryID], [Extent3].[CategorizationID] AS [CategorizationID3],
    [Extent3].[ImageID] AS [ImageID], [Extent3].[DealerImageID] AS [DealerImageID], [Extent3].[VendorImageID] AS [VendorImageID], [Extent3].[OrderIndex] AS [OrderIndex2], [Extent3].[Name] AS [Name2], [Extent3].[Description] AS
    [Description2], [Extent3].[Keywords] AS [Keywords], [Extent3].[Path] AS [Path], [Extent3].[TrashDate] AS [TrashDate2], [Extent3].[CreateDate] AS [CreateDate3], [Extent3].[ModDate] AS [ModDate3], [Extent3].[ModBy] AS
    [ModBy3], [Extent3].[OriginalID] AS [OriginalID]
       FROM   [dbo].[CategorizationProducts] AS [Extent1]
       INNER JOIN [dbo].[PriceLists] AS [Extent2] ON ([Extent1].[CategorizationID] = [Extent2].[CategorizationID]) OR (([Extent1].[CategorizationID] IS NULL) AND ([Extent2].[CategorizationID] IS NULL))
       INNER JOIN [dbo].[CustomCategories] AS [Extent3] ON [Extent1].[CustomCategoryID] = [Extent3].[CustomCategoryID]
       WHERE ([Extent2].[Active] = 1) AND (0 = [Extent1].[CategorizationID]) ) AS [Filter1]
      INNER JOIN [dbo].[DealerProducts] AS [Extent4] ON  EXISTS (SELECT
       cast(1 as bit) AS [C1]
       FROM      ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
       LEFT OUTER JOIN  (SELECT
        [Extent5].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent5]
        WHERE [Filter1].[ProductID] = [Extent5].[ProductID] ) AS [Project1] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent6].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent6]
        WHERE [Extent4].[ProductID] = [Extent6].[ProductID] ) AS [Project2] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent7].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent7]
        WHERE [Filter1].[ProductID] = [Extent7].[ProductID] ) AS [Project3] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent8].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent8]
        WHERE [Extent4].[ProductID] = [Extent8].[ProductID] ) AS [Project4] ON 1 = 1
       WHERE ([Project1].[ProductID] = [Project2].[ProductID]) OR (([Project3].[ProductID] IS NULL) AND ([Project4].[ProductID] IS NULL))
      )
      WHERE ([Filter1].[DealerID] = @p__linq__1) AND ([Filter1].[Path] = @p__linq__2) AND ([Extent4].[DealerID] = @p__linq__3)
     UNION ALL
      SELECT
      [Extent12].[DealerProductID] AS [DealerProductID]
      FROM   (SELECT [Extent9].[CategorizationProductID] AS [CategorizationProductID], [Extent9].[CategorizationID] AS [CategorizationID4], [Extent9].[ProductID] AS [ProductID], [Extent9].[DealerProductID] AS
    [DealerProductID], [Extent9].[DealerSubCategoryID] AS [DealerSubCategoryID], [Extent9].[CustomCategoryID] AS [CustomCategoryID3], [Extent9].[OrderIndex] AS [OrderIndex3], [Extent9].[TrashDate] AS [TrashDate3],
    [Extent9].[CreateDate] AS [CreateDate4], [Extent9].[ModDate] AS [ModDate4], [Extent9].[ModBy] AS [ModBy4], [Extent10].[PriceListID] AS [PriceListID], [Extent10].[PriceListType] AS [PriceListType], [Extent10].[DealerID] AS
    [DealerID], [Extent10].[PublicationsSettingId] AS [PublicationsSettingId], [Extent10].[Name] AS [Name3], [Extent10].[Description] AS [Description3], [Extent10].[StartDate] AS [StartDate], [Extent10].[EndDate] AS [EndDate],
    [Extent10].[Notify] AS [Notify], [Extent10].[NotifyEmailUserID] AS [NotifyEmailUserID], [Extent10].[NotificationEmail] AS [NotificationEmail], [Extent10].[NotificationDays] AS [NotificationDays], [Extent10].[ShowPrice] AS
    [ShowPrice], [Extent10].[ShowPriceListChoiceID] AS [ShowPriceListChoiceID], [Extent10].[Active] AS [Active], [Extent10].[CreateDate] AS [CreateDate5], [Extent10].[ModDate] AS [ModDate5], [Extent10].[ModBy] AS [ModBy5],
    [Extent10].[CategorizationID] AS [CategorizationID5], [Extent11].[CustomCategoryID] AS [CustomCategoryID4], [Extent11].[ParentCustomCategoryID] AS [ParentCustomCategoryID], [Extent11].[CategorizationID] AS
    [CategorizationID6], [Extent11].[ImageID] AS [ImageID], [Extent11].[DealerImageID] AS [DealerImageID], [Extent11].[VendorImageID] AS [VendorImageID], [Extent11].[OrderIndex] AS [OrderIndex4], [Extent11].[Name] AS [Name4],
    [Extent11].[Description] AS [Description4], [Extent11].[Keywords] AS [Keywords], [Extent11].[Path] AS [Path], [Extent11].[TrashDate] AS [TrashDate4], [Extent11].[CreateDate] AS [CreateDate6], [Extent11].[ModDate] AS
    [ModDate6], [Extent11].[ModBy] AS [ModBy6], [Extent11].[OriginalID] AS [OriginalID]
       FROM   [dbo].[CategorizationProducts] AS [Extent9]
       INNER JOIN [dbo].[PriceLists] AS [Extent10] ON ([Extent9].[CategorizationID] = [Extent10].[CategorizationID]) OR (([Extent9].[CategorizationID] IS NULL) AND ([Extent10].[CategorizationID] IS NULL))
       INNER JOIN [dbo].[CustomCategories] AS [Extent11] ON [Extent9].[CustomCategoryID] = [Extent11].[CustomCategoryID]
       WHERE ([Extent10].[Active] = 1) AND (0 <> [Extent9].[CategorizationID]) ) AS [Filter8]
      INNER JOIN [dbo].[DealerProducts] AS [Extent12] ON  EXISTS (SELECT
       cast(1 as bit) AS [C1]
       FROM    ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable2]
       LEFT OUTER JOIN  (SELECT
        [Extent13].[DealerProductID] AS [DealerProductID]
        FROM [dbo].[DealerProducts] AS [Extent13]
        WHERE [Filter8].[DealerProductID] = [Extent13].[DealerProductID] ) AS [Project7] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent14].[DealerProductID] AS [DealerProductID]
        FROM [dbo].[DealerProducts] AS [Extent14]
        WHERE [Filter8].[DealerProductID] = [Extent14].[DealerProductID] ) AS [Project8] ON 1 = 1
       WHERE ([Project7].[DealerProductID] = [Extent12].[DealerProductID]) OR (([Project8].[DealerProductID] IS NULL) AND ([Extent12].[DealerProductID] IS NULL))
      )
      WHERE ([Filter8].[DealerID] = @p__linq__4) AND ([Filter8].[Path] = @p__linq__5) AND ([Extent12].[DealerID] = @p__linq__6)) AS [UnionAll1]
    )  AS [Distinct1]',N'@p__linq__1 int,@p__linq__2 nvarchar(17),@p__linq__3 int,@p__linq__4 int,@p__linq__5 nvarchar(17),@p__linq__6 int',@p__linq__1=60,@p__linq__2=N'\Bar\Bar
    Supplies',@p__linq__3=60,@p__linq__4=60,@p__linq__5=N'\Bar\Bar Supplies',@p__linq__6=60

     

    Now, there are a few things wrong with the above query. For one, its selecting a *** TON of columns that are not needed AT ALL, in both parts of the union. Second accross both parts of the union there are SIX additional nested selects all comparing the same thing, which are then LEFT JOINED to the rest of the query. I don't exactly understand what is going on, but its like EF is trying to generate an inefficient query. Where the heck is all this junk comming from? Why is EF so inefficient compared to LINQ to SQL??? What can I do to improve the LINQ to Entities query?

    Saturday, September 20, 2008 3:41 AM

All replies

  • There are also some other completely invalid things going on with the EF query. In my database, and in my model, the CategorizationProducts.CategorizationID is marked as NOT NULL. It MUST have a value, always, yet its trying to compare nulls during a join....?!?!?

     

     FROM   [dbo].[CategorizationProducts] AS [Extent1]
       INNER JOIN [dbo].[PriceLists] AS [Extent2] ON ([Extent1].[CategorizationID] = [Extent2].[CategorizationID]) OR (([Extent1].[CategorizationID] IS NULL) AND ([Extent2].[CategorizationID] IS NULL))

     

    Why in the world is the EF query generator comparing two non-nullable columns, as defined in the database AND the EDM, with nulls as part of a join? Why would that EVER be valid? Another example of nulls in joins when joining on nulls is completely incorrect:

     

    WHERE ([Project1].[ProductID] = [Project2].[ProductID]) OR (([Project3].[ProductID] IS NULL) AND ([Project4].[ProductID] IS NULL))

     

    In this instance, both tables can have NULL in those columns...however, there is ZERO reason to EVER join on NULL, because thousands of records would match that should never, ever match. The only join predicate should be the column I actually chose to join on, ProductID...neither column should be compared to NULL during a join, EVER. LINQ to SQL gets this right...why in the world is EF doing this??

     

    Another inefficiency...why incurr the extra overhead of casting a 1 to a bit value during an EXISTS check, when selecting NULL works perfectly well?

     

    EXISTS (SELECT  cast(1 as bit) AS [C1]  FROM ...

     

    Finally, what in gods name is THIS:

     

     INNER JOIN [dbo].[DealerProducts] AS [Extent4] ON  EXISTS (SELECT
       cast(1 as bit) AS [C1]
       FROM      ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
       LEFT OUTER JOIN  (SELECT
        [Extent5].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent5]
        WHERE [Filter1].[ProductID] = [Extent5].[ProductID] ) AS [Project1] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent6].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent6]
        WHERE [Extent4].[ProductID] = [Extent6].[ProductID] ) AS [Project2] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent7].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent7]
        WHERE [Filter1].[ProductID] = [Extent7].[ProductID] ) AS [Project3] ON 1 = 1
       LEFT OUTER JOIN  (SELECT
        [Extent8].[ProductID] AS [ProductID]
        FROM [dbo].[Products] AS [Extent8]
        WHERE [Extent4].[ProductID] = [Extent8].[ProductID] ) AS [Project4] ON 1 = 1
       WHERE ([Project1].[ProductID] = [Project2].[ProductID]) OR (([Project3].[ProductID] IS NULL) AND ([Project4].[ProductID] IS NULL))


     

    This statement repeates itself TWICE, which is on top of being, judging from the clean efficiency of the LINQ to SQL statement, COMPLETELY UNNECESSARY. The two joins with Extent5 and Extent6 are EXACTLY the same as the two joins with Extent7 and Extent8. The only other time I've seen this kind of excessive junk code is when people try to purposely create inefficient sorting algorithms to see who can come up with something that runs in factorial time!

     

    Tack onto the general execution inefficiency of the SQL query the excessive verbosity (and therefor bandwidth-inefficiency) of the aliasing terms used: [UnionAll1], [Extent1] through [Extent12], [Filter1] through [Filter8], [Project1] through [Project8], [Distinct1], @p__linq_1 through @p__linq_6. Contrast with LINQ to SQL, which has terms like [t1] through [t8] and @p1 through @p8.

     

    Everything about this query is done in the most inefficient way.

     

    After just spending more than a month trying to figure out how to build an entity model around all the quirks, oversights, and oddities of the visual designer, and dealing with generation overwriting SSDL changes every time I perform an update, I find it PHENOMENALLY DUMBFOUNDING that, in a real-world scenario against an existing, real-world database, that Entity Framework generates such incredibly inefficient, and in many cases flat out incorrect SQL statements. This state of dumbfoundedness is only exacerbated by my discovery that LINQ to SQL runs the same queries MANY ORDERS OF MAGNITUDE more efficiently. The example above is only a fragment of a much larger, and more complex, query. LINQ to SQL runs the full query in about 800ms, with 10,000 reads. The same query in LINQ to Entities takes about a minute to run (or it times out), and incurrs over 2.7 million reads.

     

    I believe this ends my attempts to make use of Entity Framework v1.

     

    DUMB-FOUNDING. Simply.

    Saturday, September 20, 2008 4:19 AM
  • Could someone please explain to me why NULLs are compared in EF joins? I have another project that I was trying to use EF for, and I am running into problems with these NULL joins as well. It is a VERY rare occasion that joining on NULL values is valid, as NULL is most often used to represent non-useful values. There are several scenarios where joins on NULL are happening with EF, and none of them are desired or valid:

     

    The Join:

     

    (T1.Key = T2.ForiegnKey) OR ((T1.Key IS NULL) AND T2.Key IS NULL)

     

    Case 1: Both non-null columns joined with NULL

    Invalid: T1.Key and T2.ForeignKey can never be null, joining on null is useless.

     

    Case 2.1: Key column non-null, foreign key column null joined with NULL

    Case 2.2: Key column null, foreign key column non-null joined with NULL

    Invalid: T1.Key, or T2.ForeignKey column can never be null, so joining on null is useless.

     

    Case 3: Both nullable columns joined with NULL

    Invalid: Null values do not represent useful values, and do not necessarily represent keys. Joining where both fields are null results in tens of thousands to hundreds of millions of rows returned for completely invalid associations between rows that have no association.

     

    In my query from Friday, I ran into case 3...hundreds of millions of rows are being returned sub queries, and EVERY single one of them is an invalid row...NULL joined to NULL is completely invalid. The tables in question map to two things...one mapping is between T1.ProductID and T2.ProductID, the other mapping is between T1.DealerProductID and T2.DealerProductID. The set for ProductID has a lot of nulls for DealerProductID....the set for DealerProductID has a lot of nulls for ProductID. LINQ to SQL gets it right, and only joins on ProductID = ProductID/DealerProductID = DealerProductID. The volume of rows in these tables is approximately:

     

    T1: 515,000 rows

    T2: 100,000 rows

     

    Given that, for either set, anything that doesn't match on the key will match on the null, that means I have 515,000 * 100,000 associations if a join is done with NULL. Thats 51,500,000,000 rows! 51 BILLION ROWS!!!!!

     

    This is a very serious problem with Entity Framework.

    Saturday, September 20, 2008 6:52 PM
  •  Jon Rista wrote:

    I am trying to optimize a LINQ to Entities query, and I am having a hard time figuring out how. When I experiment with LINQ to SQL, I am able to get amzingly efficient SQL generated. For example, the following LINQ query:

     

    Hi Jon,

     

    This is not an answer to your questions, I just wanted to confirm that your findings that EF tend to generate really poor SQL compared to L2S are accurate.

     

    On a related note, Julie Lerman recently wrote a short entry in her blog on poor where clause SQL generated by EF:

    http://www.thedatafarm.com/blog/2008/09/19/ConvenientCLRMethodsArentAlwaysTheBestForLINQToEntities.aspx

     

    In one of Julie's examples, a where clause containing SomeField.StartsWith("S") is translated by EF into "WHERE (CAST(CHARINDEX(N'S', [Extent1].[LastName]) AS int)) = 1".

     

    Linq-to-SQL translates the same thing into "WHERE [t0].[LastName] LIKE @p0".

     

    Now, for anyone not familiar with the difference in effect the above will have on the database, let me explain:

     

    EF's "CAST(CHARINDEX(" will force SQL Server to do a full table scan on the table being queried. This means reading all data in the table to find the records being sought. It will not be able to use any indexes on the column being queried.

     

    L2S's LIKE will do an index scan if the column being queried is indexed. This means it will only touch parts of an index and then retrieve only the matching records from the table.

     

    The difference in I/O [and db performance] impact if the table is large is like...translated into vehicle terms...MPG on a moped riding downhill with the 50cc engine idling versus MPG on a 747 during takeoff, loaded to maximum takeoff weight and in strong tail wind...

     

    On a database like "northwind" this will not make a difference; the entire DB will be in RAM after the first EF query hits it. (SQL Server is good at detecting and dealing with this kind of atrocities, at the expense of system resources of course). Unfortunately, in the real world people tend to store more information in their databases than "northwind" and may not have enough RAM to have the entire DB in memory.

     

    The sad thing is that official line from MSFT seems to be that EF is touted as something "better" than L2S. I'm afraid that this can [mis]lead people into choosing EF over L2S or even lead to existing working L2S apps being converted to use EF (based on marketing speak rather than a real side-by-side evaluation of the two).

     

    [end of rant]

     

    I hope the EF team will find their way to Matt Warren's blog some day; his "building an IQueryable provider" has a lot of good examples that the EF team can download and copy-and-paste in order to get more accurate and efficient translations from Linq expressions into SQL...   (Matt's code samples also addresses the few shortcomings L2S has in the same area; overuse of subqueries under certain circumstances so while they're at it I hope they apply those changes to L2S as well)

     

    http://blogs.msdn.com/mattwar/

     

     

     

    Monday, September 22, 2008 6:03 AM
  • I actually did read Julie's post, which is why my LINQ query did a Substring rather than a StartsWith. The performance issues are not simply due to the WHERE clause. The example query I posted was inefficient in just about EVERY way: Selecting dozens of columns in a subquery that are COMPLETELY useless, joining on NULL (always invalid unless I SPECIFICALLY ask for it), a bunch of LEFT JOINED subqueries that all do the same exact thing, over and over, EXCESSIVE VERBOSITY in the query (extra columns that are not used, excessive terms for things like extents, projections, unions, and very long parameter names), etc.

     

    Entity Framework has some VERY SERIOUS PROBLEMS with its query generator that not only cause performance problems, but cause OUT RIGHT INVALID sql statements to be generated. I'm a bit surprised that no Microsoft employee has replied to this thread to refute my claims...I figured they would have come canned response ready to brush this under the rug. Their silence is intriguing, and either they know they have these serious issues and are quietly trying to solve the problem (which would at least be a step in the right direction), or they are ignoring me hoping I'll shut my mouth and go away. Problem is, this problem will make or break EF...it NEEDS to be fixed, otherwise EF is completely non-viable in any real-world scenario. They also can't fix one problem, and leave another. The excessive verbosity is a serious problem...moving to EF from L2S will mean our network traffic will more than double, as EF sql is at least, but often more, than twice as large as L2S sql.

     

    I have come to the conclusion that EF was released VERY PREMATURELY. It is no where near the level of L2S in all the areas that matter (POCO, SQL Generation, low SQL verbosity, performance). The only area where it improved on L2S is the visual designer, but in its current form, that is largely a failure for anything other than small databases with few, if any, special mapping needs. I've spent a solid month trying to figure out EF and make it work (we had an L2S solution up and performing well in a couple days), and it has been a miserable failure.

     

    At this point, the only feeling I have about EF is grave dissapointment. Crying

    Monday, September 22, 2008 3:42 PM
  • Hi Jon,

     

    Actually, all of the examples in Julie's post have the same impact on the database. The where clause generated by all three excludes the use of any indexes so whether you use StartsWith, Substring or Left you're forcing a table scan.

     

    If you have a working solution written against L2S my only advice would be to stick to L2S for now. I'm sticking to L2S for real production code at the moment, while treating EF as a tech-preview tool to play around with.  ...and of course I will reconsider EF for real world development when they release the next version.
    Monday, September 22, 2008 4:20 PM
  • Well, I have been using Substring, with L2S and EF, and it uses indexes. I have some indexes built specifically for some of my LINQ queries, and they include extra columns that are not part of the index so that all information may be retrieved by just an index scan. I can't say the same for StartsWith or Left, however.

     

    Monday, September 22, 2008 5:20 PM
  • Hi Jon

     

    Sorry for not replying earlier, here in Redmond is Monday morning, that's probably why no Microsoft employee answered before.

     

    We have done quite a lot of work on optimizing queries; but L2E cannot do the same work than L2S, simply because it has to target multiple databases, not just SQL Server. Additionally, the Entity Framework has some different purposes and features than Linq to SQL, so it is hard to match all cases on each scenario. Particularly, your LINQ query is not trivial, is quite complex to start with.

     

    That being said, I agree some of the problems that you are raising shouldn't be there. Can you share with me the EDM model / database schema so that I can get a repro and verify if this is a bug, and get the adequate recommendations to improve your query.

     

    Bruno (Performance Test)

    Monday, September 22, 2008 5:26 PM
  • I fail to see what L2E vs L2S and different target databases has to do with pulling colums that are not needed in the select queries. Is there a database that requires selecting all columns? Is there a database in which it makes sense to join on NULL?

     

    The only thing that can be excused resulting from multiple target databases is the StartsWith query but then again there should be store specific data provider under EF. Isn't this provider supposed to recognize this?

    Monday, September 22, 2008 6:22 PM
  • Jon

    My brain is too fried to process your query and the resulting tsql.

     

    One thing I wanted to point out which may or may not have anything to do with your case and also is not an attempt to justify your findings.

     

    <caveat...> If I am remembering this correctly...and it's somewhere in these forums but I can't find it again.... (if I'm wrong, hopefully Bruno or someone will correct me very quickly)...</caveat>

     

    The post asked something like:  "why is the sql query hitting 5 tables when I'm only querying for an entity that maps to a single table?"

     

    When you query for an entity, EF brings in whatever data it needs to build any entity ref's entityKeys with it's foreign keys. It doesn't just use the foreign key in the target entity but goes over to the table with the Primary key to build the entity key correctly for that record. The objectContext needs to create the "stub" entityEntries for those other ends, even if you haven't materialized them.

     

    I wonder what the sql would look like if you ran your query with MergeOpton set to notracking. Then it doesn't need to build the relationshipEntry or the stub EntityEntry.

     

     @Kristofer

     

    Really? (That's not a facetious question. I am not a sql server guru so you are scaring me.)

     

    julie

     

    Monday, September 22, 2008 6:58 PM
  • Kris

     

    Adam Machanic has just explained "Search argument" predicates to me and how the indexer is able to search some operators easily and others won't be able to use the indexes unless i have explictly created them (he didn't say that last part - I'm inferring it).

     

    He's confirming that L2S very nicely turns my queries into LIKE 'xxx%' which is good and as you pointed out L2E is creating the predicates in all three of my queries with SQL that is not searchable.

     

    Egad.

     

    Monday, September 22, 2008 7:40 PM
  • Hi Julie,

     

    I'm going to give the MergeOption.NoTracking a try. That may improve the performance of the query used as an example in this thread, however....that query was just a sub-query of a larger query. The larger query DOES need tracking... At the moment I have pretty much given up using EF for this. I spent the last month trying to build an entity model that met my needs...only now that I am using it in a real-world scenario, the vast majority of the queries I run against the model either do not return at all (they time out), or they produce the most ghastly SQL I've ever seen and run very slowly.

     

    "When you query for an entity, EF brings in whatever data it needs to build any entity ref's entityKeys with it's foreign keys."

     

    Yes, I realize that if I query for an entity, EF will bring in everything it needs to build its association metadata. However, this query is a custom projection (the end result is just a list of ID's from a table, so that I can join to that list and reduce the list of information I am getting from the "real" query that I need entities from). I would fully expect all columns to be retrieved if I was querying for an entity. I was very surprised to see so much going on for a simple 1-column projection query, however. Crying

     

    Hi Bruno,

     

    I will try to get you a copy of my EDM and my database schema. I'll also see if I can get you some sample data to work with, because this part of our database is an ugly mess (I am limited by several factors in how I can modify our database and business processes, so I am stuck working with an illogical schema that is too complex for what we need).

     

    I understand that EF has to support a broader base of database servers...but that shouldn't change the fact that selecting columns that are not needed in the final projection is only hurting performance. It also shouldn't change the fact that joining on null will result in a high volume of invalid matches.

     

    I also would again like to stress the verbosity issue. EF queries are VERY verbose. We are in the verge of doubling the number of sites we have running on our platform right now...even if EF's queries did not join on nulls and select columns that were not necessary, the extra verbosity is still a consern...enough to keep us on L2S for the forseeable future. I don't know how much inter-departmental communication goes on at Microsoft, but I think there is a lot to be learned from the L2S guys. They didn't create the most widely compatible product, but what they did create is SOLID from every angle.

    Monday, September 22, 2008 8:00 PM
  • the notracking was not a solution - just curiosity...

     

    Monday, September 22, 2008 8:23 PM
  • This might be because I spent the last 3 days, 16 hours a day, trying to figure this problem out. But, I can't seem to enable MergeOption.NoTracking on this query. Its a custom projection, and its not returning any entities...just a list of numbers. Is it only possible to set MergeOption on queries that return entities?

    Monday, September 22, 2008 10:33 PM
  • Thanks Bruno.  I am probably wrong here.  But would not the provider model allow anything?  I mean the provider for sqlS knows sql server.  It should be able to take the expression tree and munge it anyway it wants to make most efficient query.  I ask for knowlege sake.

    Monday, September 22, 2008 11:35 PM
  • I would have thought the same thing...that Providers would provide database-specific, optimized queries...

    Tuesday, September 23, 2008 3:00 AM
  •  Julie Lerman wrote:

    Adam Machanic has just explained "Search argument" predicates to me and how the indexer is able to search some operators easily and others won't be able to use the indexes unless i have explictly created them (he didn't say that last part - I'm inferring it).

     

    He's confirming that L2S very nicely turns my queries into LIKE 'xxx%' which is good and as you pointed out L2E is creating the predicates in all three of my queries with SQL that is not searchable.

     

    Hi Julie,

     

    I'm generalizing a bit, but to stay on the safe side with the optimizer you don't want to wrap indexed columns used for lookups in where clauses and joins, or indexed columns used for grouping/ordering in scalar functions or do any calculations/transformations of them.

     

    Although the SQL Server optimizer is getting smarter and smarter for every version and can use indexes sometimes in such situations (SQL 2005 or higher) - in most situations where it is really needed (esp. in complex queries) it will not.

     

    The where clauses that EF generated for you; "WHERE (CAST(CHARINDEX(N'S', [Extent1].[LastName]) AS int)) = 1", "WHERE N'S' = (SUBSTRING([Extent1].[LastName], 0 + 1, 1))", and "WHERE N'S' = (LEFT([Extent1].[LastName], 1))" are excellent examples of what to not send to SQL Server (or Oracle, or Informix, or DB2, or ...). They will result in table or index scans on all of these db platforms due to the db evaluating the function expression and/or calculations before the comparison.

     

    If you want to see the SQL optimizer's opinion on this - the following SQL script will create a simple "person" table, populate it with 100k records and then find the ones (in this case only one) beginning with "S" using first a L2S-style "like" where clause and then the three variants that EF generated for you. The first set of output shows the execution plans and the second part shows the I/O statistics:

     

    Code Snippet

     

    --first, create a person table with id, first and last name

    set nocount on;

    go

    create table person (person_id uniqueidentifier not null, first_name nvarchar(100), last_name nvarchar(100), constraint pk_person primary key (person_id));

    go

     

    --populate the person table

    declare @person_no int;

    set @person_no = 1;

    while (@person_no < 100000)

    begin

    insert into person (person_id, first_name, last_name)

    values (newid(), substring(convert(nvarchar(38), newid()), 1, 8), substring(convert(nvarchar(38), newid()), 1, 8));

    set @person_no = @person_no +1;

    end

    insert into person (person_id, first_name, last_name) values (newid(), 'John', 'Smith');

    go

     

    --add a couple of indexes

    create index ix_person_first_name on person (first_name);

    create index ix_person_last_name on person (last_name);

    create index ix_person_first_last_name on person (first_name, last_name);

    go

    set nocount off;

    go

     

    print 'execution plans';

    go

    set showplan_all on;

    go

    select * from person where last_name like 'S%'

    go

    select * from person where (CAST(CHARINDEX(N'S', last_name) AS int)) = 1

    go

    select * from person where N'S' = (SUBSTRING(last_name, 0 + 1, 1))

    go

    select * from person where N'S' = (LEFT(last_name, 1))

    go

    set showplan_all off;

    go

     

    print 'I/O statistics';

    go

    set statistics io on;

    go

    select * from person where last_name like 'S%'

    go

    select * from person where (CAST(CHARINDEX(N'S', last_name) AS int)) = 1

    go

    select * from person where N'S' = (SUBSTRING(last_name, 0 + 1, 1))

    go

    select * from person where N'S' = (LEFT(last_name, 1))

    go

    set statistics io off;

    go

     

     

    drop table person

    go

     

     

     

     

     

    On a simple query against a simple table like the one used in the sample script above the impact is not too severe; the EF-style where clauses increases the IO impact by just some 8,000% - 12,600% on this simple lookup (530-760 reads) compared to using the L2S where clause on the same table (6 reads).

     

    On more complex queries where many large tables may be involved, containing more data, and the where clause is more complex etc etc - the impact can be much more severe due to the optimizer then [potentially, and far too often] opting for other less-than-optimal join orders, hash joins on very large intermediate result sets etc etc.

     

    ---

     

    An anecdote: I did a performance investigation for a company running a mid size SQL Server db. They were not using EF, but the cause of the problems was SQL plagued by the same mistake as the EF generated queries so although that customer's problem was not caused by EF I think it serves a valuable lesson as EF has the potential of reproducing exactly the same problem en-masse.

     

    The db server in question was completely bogged down and the users were on the brink of nervous breakdown having to wait minutes for very simple operations to complete. The cause? One frequently executed query with a "field1 + "-" + field2 = 'some-value'" in the where clause causing full table scans on three large tables and spooling to an intermediate temporary table with the final I/O impact of some 800,000 page reads. Changing the same where clause to "field1 = 'some' and field2 = 'value'" brought it down to less than 10 page reads and they sent me a happy email saying "We see a night and day difference after the one change Friday. ... Thanks for your quick work!" (emphasis added by me)

     

    So what is my point with this? A single query with a single improper expression in the where clause for a query performing a common business scenario can be enough to bring an entire business critical system to its' knees. And the link to EF is that this thread, Julie's blog entry etc etc shows that EF likes to generate the kind of SQL that will lead to exactly this.

    Tuesday, September 23, 2008 3:32 AM
  •  Jon Rista wrote:

    I would have thought the same thing...that Providers would provide database-specific, optimized queries...

     

    DB-specific provider or not - the sample SQL you got from EF are equally bad for all of: SQL Server, Oracle, and Informix. The ...or...null... join criteria will result in scans.

     

    I can not imagine any reason why any of that would make sense on any db platform that I have ever used. But then I have only used the three mentioned above so there may be some special case involving some other db platform. Maybe mySQL, caché, or postgresql adds some special limitations and requirements to the SQL syntax used.

     

    However, it would be nice if some future version of EF is enhanced with a "nice-for-SQL-Server fallback mode" where it uses the L2S query generator (along with some of the enhancements presented by Matt Warren).

     

    Final note: EF team - please re-think your priorities for the next version. The intention/goals/potential for EF is great but this first version is... ...not ready for production use...

     

    Ok, ok, I'm going to shut up now...  ...rant over. Someone else take the stick...

    Tuesday, September 23, 2008 3:52 AM
  •  

    Oh, come on. Stop with this "Next version". If the problem is severe make a hotfix or something. Frankly I cannot understand this query and I do believe it is complex and rare but I don't want to use EF with an SQL Profiler open all the time. If not compile a list of cases where EF does not generate efficient SQL so we can avoid it.
    Tuesday, September 23, 2008 4:40 AM
  • Hi Jon, Stilgar and William

     

    While we get your model / schema, that I have some folks waiting for to get a better understanding and help you fix the issue, I will address some of your specific concerns:

     

    To Stilgar:
    a) On pulling non needed columns: There is no such thing as "partial objects". So if you query entities, you need all columns. There is a different way to get read-only data in projections, which naturally requires NoTracking.
    b) On the different databases, yes, the provider model will be able in the future to recognize the LIKE syntax required. This is work in progress for the next releases

    c) If we don't find a reasonable workaround, you are right, we could probably do a hotfix. We will find out once we get adequate data if this is a bug that requires the hotfix or there is some other better (and faster) way to improve Jon's app performance.

     

    To Jon:
    a) Please tell us the results of using NoTracking. For this custom projection query, I expect it to be more performant.
    b) It is good that you raise that an illogical and too complex schema is part of your problem
    c) The NULL problem: We are imposing CLR semantics, so a non-null column can still map to a nullable object, for example as a result of OUTER join queries. That being said, we are working on some possible optimizations here.
    d) On the verbosity: We have found that some "verbose" queries are actually more efficient. But reducing unnecesary verbosity is something we will like to do on the cases that are inadequate.
    e) On inter-departmental communication: I have sit down with guys that worked on L2S optimization, and now the L2S team is integrated with the EF team, so I don't expect that to be an issue. Of course we can always keep learning from each other, since both have different advantages and trade-offs.

    f) Sample data would be good, but just the schema and model should be enough for helping you improve performance.

     

    To William:
    You are right the provider model will allow for getting more optimal queries. This is a never ending task, we will always have some specific optimizations to add for any specific provider; it is a trade off between been generic, time and optimization for an specific provider.

     

    Again, thanks everybody

     

    Bruno

    Tuesday, September 23, 2008 5:20 AM
  • a) On pulling non needed columns: There is no such thing as "partial objects". So if you query entities, you need all columns. There is a different way to get read-only data in projections, which naturally requires NoTracking.

    Hate to say it, but I would have to completely disagree with that statement. I wouldn't call them "partial objects", but I would call them "custom objects" or "custom projections". It is not always desired to query for one of the entities in the model. Sometimes you just want some information, not an entity. Quite often, that information is part of a LINQ subquery, as is usually the case with me, and then they most definitely should be treated as custom projections, not entities. The undesirable result is clearly seen in my first post...dozens of columns were selected in a third-level SQL subquery that have ZERO value when taken as a whole, and just clog up the wire between code and database. You can't make the assumption that every LINQ statement is going to result in a full blown entity...either taken in isolation, but particularly when that LINQ statement is part of a larger statement that will ultimately return the entities. LINQ to SQL gets this done perfectly right...if you are creating a custom object, it doesn't move unnecessary chunks of data around in SQL Server or accross the wire...it keeps things as minimal and compact as possible.

     

    a) Please tell us the results of using NoTracking. For this custom projection query, I expect it to be more performant.

     

    My LINQ query is returning a custom object newed at the select clause. To my knowledge, only ObjectQuery<T> has a MergeOption property...but my query is returning an IQueryable<T>/IEnumerable<T>. I am not querying for an entity...I am querying for information, so I am not merging anything into the ObjectStateManager. This is ultimately a subquery in a larger LINQ statement so I could care less what EF does with the data...well, rather, I preferr that EF does nothing with it other than project it into the most efficient SQL statement possible.

     

    b) It is good that you raise that an illogical and too complex schema is part of your problem

     

    Its more of an illogical business "need". The schema itself is logical, given the information we need to store and the intrinsic relationships between that data. How were required to use it is what's illogical. Rather than having our products in one cateory at a time take from a single set of categories, legacy business processes dictate that we must allow multiple sets of products to exist in multiple sets of categories at the same time. This is easy to manage in our administration tools, however, our client tools need to see these multiple sets of products and categories as a single, cohesive set. One "virtual" set of categories containing one "virtual" set of products. Categories with the same "path" must be merged, and all products from any category in each set of categories with the same path must appear on the same page in our client UI (illogical, complex, unnecessary, possibly impossible...blah). My task is to provide this merger, real-time, in a performant way. I beleive I have accomplished this with LINQ 2 SQL and custom projections (havn't quite met the performance bar yet)...however, EF is a whole different story. In its current form, I don't think EF is capable of doing this...neither with custom projections, nor with actual entities (mapping restrictions prevent me from creating the entities I need). (See first post for an example of the problem Stick out tongue)

     

    c) The NULL problem: We are imposing CLR semantics, so a non-null column can still map to a nullable object, for example as a result of OUTER join queries. That being said, we are working on some possible optimizations here.

     

    Sure, an OUTER join could map to a null at the SQL level. However, in the query above, and in all other cases where I have seen these null joins, its always on an INNER join. Also, why would you say that a non-null column could map to a nullable object? A non-null column sould NEVER map to a nullable object, except possibly if the CLR type is a string. A non-null int column should map to an int type, not an int? type, etc. In the case of references, if the foreign key column in the database is non-null, then it would always map to something, and should never be null in code (or, the reference metadata should hold a non-null key), regardless of the type of join used, right? If it did, wouldn't that break the rules of the association (i.e. 1..1 maps to a non-nullable vs 0-1..1 maps to a nullable)?

     

    d) On the verbosity: We have found that some "verbose" queries are actually more efficient. But reducing unnecesary verbosity is something we will like to do on the cases that are inadequate.

     

    Interesting that more verbose queries would be more performant. However, my concern wasn't so much from the perspective of execution performance, and more from the perspective of network performance. We need to support hundreds of web sites on this platform by the end of the year, with an accelerating level of traffic for each site since were working on improving (or starting) SEO for each site. Given the verbosity of EF's SQL generation, and the complexity of our queries...network bandwidth between our web servers and database server is going to become pretty saturated. I am doubtful that the performance gains from using an alias like [Extent1] over [t1], in addition to the greater complexity and use of unnecessary columns in EF, would outweigh the performance loss on the wire.... but, I could be wrong about that.

    Tuesday, September 23, 2008 6:13 AM
  •  Jon Rista wrote:

    What can I do to improve the LINQ to Entities query?

     

    Jon - if you want a way forward while still using [the current version of] L2E I think the best way is by taking control over query composition in cases like this.

     

    Move the main part of the query into a view:

     

    Code Snippet

     

    create view DealerPathDPID as

     

      select dp1.DealerProductID, dp1.DealerID, cc1.Path

      from DealerProducts as dp1

      inner join CategorizationProducts as cp1 on cp1.ProductID = dp1.ProductID

        and cp1.CategorizationID = 0

      inner join PriceLists as pl1 on pl1.CategorizationID = cp1.CategorizationID

        and pl1.DealerID = dp1.DealerID

        and pl1.Active = 1

      inner join CustomCategories as cc1 on cc1.CustomCategoryID = cp1.CustomCategoryID

     

    union all

     

      select dp2.DealerProductID, dp2.DealerID, cc2.Path

      from DealerProducts as dp2

      inner join CategorizationProducts as cp2 on cp2.DealerProductID = dp2.DealerProductID

        and cp2.CategorizationID <> 0

      inner join PriceLists as pl2 on pl2.CategorizationID = cp2.CategorizationID

        and pl2.DealerID = dp2.DealerID

        and pl2.Active = 1

      inner join CustomCategories as cc2 on cc2.CustomCategoryID = cp2.CustomCategoryID

     

     

     

    ...add that view to your EDMX...

     

    ...and change your L2E Linq query to query the view instead:

     

    Code Snippet

     

    var dpids =

      from dpid in nd.DealerPathDPIDS

      where dpid.Path == path

         && dpid.DealerID == dealerID

      select dpid.DealerProductID;

     

    Not perfect but if you don't want to waste time on the issue I think it is the most efficient way forward. (I'm assuming you want to stick to EF/L2E that is...  ...I would of course consider a rollback to L2S if I were in your shoes.)

    Tuesday, September 23, 2008 9:27 AM
  •  

    I'm still stuck on the predicates and the bad sql. Maybe I should do this in another thread but E-SQL does have a LIKE operator. Why are you so adamant about using LINQ? L2E is *always* my first choice, but when L2E doesn't satisfy my needs, then I'll use ESQL. You can have more control over the query and how it is composed by the processor.

     

    I"m going to write a blog post about this and even recommend that if you care about how your db is processing, then you might want to always use esql for where clauses. You can also use LIKE in the Where query builder method which means you can write the rest of your query using L2E (operator syntax or method syntax)

     

    Very simplistic example but...

     

    Dim contacts = From c In context.Contacts.Where("it.Lastname like 'S%'") Select c.FirstName, c.LastName

     

    Just be sure to put the Where method directly on the contacts becuase context.contacts is an ObjectQuery.

     

    @Jon...

    And completely different point - you were asking about MERGEOPTIONS on L2E. LINQ Queries implement objectqueries. You can cast the linq query to an objectquery then use any of the objectquery goodies - mergeoptions, totracestring, etc.

     

    Kris is also thinking out of the box here. Don't get stuck on teh L2E. You have a great host of options available to you in Entity Framework - from using Entity SQL to building really hard queries right into the model .

     

    hth

     

    julie

    Tuesday, September 23, 2008 11:11 AM
  •  

    Jon, As a debug kind of guy, I would also try to reduce your query to the exact minum needed that shows the issue(s) and remove other.  That way it will be a lot easier for humans to find the source.  Which, in fact, may be required for some other edge case in the EF, etc.  Many times, all the extra complication is just noise.  The other issues are probably the same issue repeated.

     

    In terms of selecting Fields/objects, I have a question on that.  If you do a:

     

    var q = (from u in db.users where u.name=="abc" select u).Any();

     

    The SQL will select each Users column, when all you really need is 1 field to flag the Any.  I was just wondering in the general case when we use Any - should we be making a projection of 1 field first?  Anyone did perf on this?

    Tuesday, September 23, 2008 12:45 PM
  • @Julie/Kris,

     

    Thanks for the useful input. I have done a fair amount of eSQL in my applications. I am not as familiar with it as I am with LINQ yet, hance the reason most of my queries are in L2E. I don't think that my problem is stemming from .Substring...I have tried a variety of different string methods to get the results I need, and all of them have performed the same except for .StartsWith. Indexes are being used even with .Substring involved. I myself am not afraid of using eSQL rather than LINQ, however I am the senior guy on our projects, and I've already thrown a ton of new things at the other devs in the last few months (WCF, MVC, SOA, L2S). They are quite familiar and proficient with LINQ now, and one of my main concerns was to keep things consistent (and not throwing another ingredient into the new-technology cookie dough). If eSQL works/performs better, however, I'll use it when the time comes.

     

    I may be missing somethign here, but isn't ObjectQuery a generic? It takes an entity as a type parameter...but since my query isn't returning entities...I couldn't cast.

     

    I have also already gone way outside the box with these queries. The example on this page is a considerable reduction of the full query, which involves joining a couple other entity sets, joining on the results of the query from this thread, then selecting from the results of that, grouping, then selecting from a couple sub queries to get the FirstOrDefault row for each output row in a custom projection. To improve performance and reduce the number of subqueries/convert subqueries to joins, I have created a good half-dozen views in the database and added them to the model. I think I have already converted the query originally posted to this page into a view, along with some additional information. Its an ugly complicated mess, and by no means is that aspect of complexity EF's fault. Its been a massive reduction effort, converting the original LINQ statement primarily into views...however, the bulk of these views can not be indexed as they either perform aggregate functions or UNIONS (thats another question I have...why it is not possible to create an indexed view when you use MIN/MAX, AVG, etc...I don't see how they are any different than SUM/COUNT)...so the performance benefit from converting it all to views has been minimal. Considering the benefits gained by using L2S/L2E/eSQL in code, I think I may just keep it all in code, so its more adaptable if/when the schema changes, and centralize maintenance of the whole blob.

     

    (I have already fallen back on L2S for the project that started this thread...my experiences with EF the last month have shown me that its just not ready for use in a production environment. Too many quirks in the designer, and not enough support in the designer for the full range of EDM capabilities. Not enough configurability to let me choose how I want things to be generated, and keep them that way. Continual overwriting of manual edits in the SSDL, which forces a lot of manual maintenance of the model...or forces you to always manage the model manually at the xml level. Too many limitations on what kinds of entity mappings are allowed...preventing me from creating entities that would directly service my needs in this situation (for example, requiring that the full entity key be involved in all associations...rediculous requirement considering that such a thing is not required in any relational database's foreign keys). When performing simple queries, just selecting a few entities from the data store for simple predicates, EF performs fine. However, when the need arises to select entities based on a more complex set of criteria, or when the need to select just information rather than entities arises...EF doesn't compete with L2S. A lot of the promise of EF...like a rich visual editor that provides rapid modeling capabilities...is not yet a reality. Its really dissapointing, but I have no choice but to cut off my attempts and move forward with the project to meet deadlines.)

     

    @William

     

    In regards to the query you asked about. I would expect that to turn into the most efficient statement possible. Something along the lines of:

     

    SELECT (EXISTS (SELECT null AS [c2] FROM [users] [t1] WHERE [t1].[name] = @p0)) AS [c1])

     

    Which would simply return a boolean...which is exactly what you asked for. You didn't ask for an entity, so no entity should be returned, and no entity columns should be involved in the query except for those involved in the predicate.
    Tuesday, September 23, 2008 3:48 PM
  • I understand your moving back to L2S given what you are running into.

     

    Just to continue the interesting discussion though... :-)

     

    WRT throwing too much at the devs - totally understood. This is where Kris' suggestion of using a QueryView can be valuable when the time is right. WIth queries that are too much for L2E for one reason or another (or that you think will be needed frequently or that you want to keep the devs from having to figure out how to express) you can build ESQL queries right into the model. But it's not quite the same as an object query with ESQL query because you are writing the ESQL against the store schema, not the CSDL. You can use queryview to replace an entityset mapping or to create your own entity & entityset. It's another way to hide some of the complexity from the developers.

     

    Next topic...

     

    There is a non-generic ObjectQuery. :-)

     

    Code Snippet

    //l2e query with no projection

    var lquery = from c in context.Contacts

                 where c.FirstName == "Robert"

                 select c;

    var oq = (ObjectQuery)lquery;

    var oqsql = oq.ToTraceString();

     

    //l2e query with projection

    var lqueryproj = from c in context.Contacts

                     select new { c.FirstName, c.LastName };

    var oq2 = (ObjectQuery)lqueryproj;

    oqsql = oq2.ToTraceString()

     

     

    Tuesday, September 23, 2008 7:25 PM
  • Executing the query with MergeOption.NoTracking does not change the generated SQL query in any way. It is identical to the query from the first post in this thread.

     

    Given the general simplicity of the query...its just the union of two nearly identical queries that have a couple joins in them, I am not sure that using eSQL vs. LINQ will change anything. My expectation is to get the SQL generated by L2S.

     

    I am still unable to fathom why the SQL generated by EF is so complicated. It makes zero sense to me...I have yet to figure out exactly what it is trying to do. It would be helpful if a Microsoftee could explain what the query is actually trying to do.

    Tuesday, September 23, 2008 10:01 PM
  • Since I'm in a ranting mood...this forum software sucks. IE hates these forums, and these forums hate IE (and every other browser). I have lost this post 4 times in a row now. First time I got a forum error. The last three times, it acted like it posted fine, but my post was nowhere to be found. >.< I would use Opera, but Microsoft has an innate knack for screwing Opera in every way, on every Microsoft site, despite Opera's STELLAR standards support. Gotta love that crummy IE-only html/css, right? A suggestion for IE8....keep previous page state around for a while...Opera keeps FULL page state in memory, so if something goes wrong during a post back (happens occasionally...for whatever reason), I can just hit the back button, and everything I typed is still there allowing me to hit post again, which usually submits successfully the second time (except on these forums...they refuse to work with anything but IE, so that option is out the window). Just another great example of the wonderful quality *sarcasm* of Microsoft products...even the forms are riddled with bugs and incompatibility.

     

    </rant>

     

    Regarding ObjectQuery and MergeOption...I tried it, and there was no change. The SQL generated was identital to that in the first post of this thread.
    Tuesday, September 23, 2008 10:09 PM
  • Jon

    Bad boy. (slaps wrist)  Rant about the forum software here: http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=52&SiteID=1

     

    Thanks for reporting back on the mergeoption. I guess that the process uses the same logic to build the query whether or not it needs to construct the relationship objects.

     

    julie

    Tuesday, September 23, 2008 10:27 PM
  •  Julie Lerman wrote:

    This is where Kris' suggestion of using a QueryView can be valuable when the time is right.

     

    Just a minor clarification: my suggestion was to use a T-SQL view (database side view) as that adds [SQL optimizer] benefits over using model-defined views.

    Wednesday, September 24, 2008 12:49 AM
  • oops

     

    Rephrase

    "in addition to using DefiningQuery in the SSDL as Kris suggested, QueryView in the mapping layer can also help .... "

     

    I still love Mike Pizzo's desciription of DefiningQuery (and SSDL defined functions) as the "ultimate escape hatch" :-)

    Wednesday, September 24, 2008 12:56 AM
  •  Julie Lerman wrote:

    oops

     

    Rephrase

    "in addition to using DefiningQuery in the SSDL as Kris suggested, QueryView in the mapping layer can also help .... "

     

    I still love Mike Pizzo's desciription of DefiningQuery (and SSDL defined functions) as the "ultimate escape hatch" :-)

     

    Hi Julie,

     

    Even one layer further down; a view in the database...  (note that my sample starts with the tsql "create view" which creates the view in the db)

    Wednesday, September 24, 2008 1:01 AM
  • Views are a useful tool...however, when you need to have EF queries, built on DefiningQueries, built on database views, some of which are built on more database views, the only conclusion is to think something is wrong. I try to be a practical kind of guy, and from a practical standpoint, having queries in half a dozen places to meet a single need...well, it isn't practical. This single query has turned into a nitemare...partially because the business requirement is just plain dumb, and imposes some complexity that isn't necessary. But the true horror is having to surmount wave after wave of technical difficulties from the base platform upon which I am trying to solve this business problem. I am a big fan of Microsoft, particularly of their development tools...and the difficulty EF has posed has been somewhat shocking. Given what I now know, I would have more than gladly waited another few months if thats what it took for Microsoft to get EF completed and properly polished. I shouldn't have to fight with the tools that my profession is based on.

     

    LINQ to SQL was/is a great platform...it offered a simple way to simplify and solve problems. It wasn't too flexible, but it was simple, fast, and stable. It had a lot of polish. EF was supposed to improve on L2S, and bring to the table a rich visual editor, expanded mapping capabilities, and introduce the first phase of THE new data modeling platform from Microsoft. The ultimate goal was the simplification of data access, built upon a rich data modeling framework...

     

    Having spent a month (or, if you count the beta days...many months) just learning how to create a useful model that meets business needs, after which I find that it is practically impossible to query that model for any information more complex than a set of entities or with criteria more complex than a fiew equalities and string comparisons, in a rapid manner with good performance... Well...Well, you'll have to forgive me for finding having to bend over backwards finding ways to create views and query definitions and add several more layers of complexity to an already complex project just to project some data into an object to be just a bit impractical. As an architect, it is my job to bring simplicity and organization to a generally chaotic world. I don't think its too much to ask that I NOT have to fight with and overcome limitations in the underlying technology upon which I practice my profession: I shouldn't have to build half a dozen views, clutter up my domain model with one-off "utility" entities, manually rebuild my SSDL every time I make a change, and...well ***, base jump off a cliff to dive through flaming rings while incurring the wrath of the all powerful Customer for not delivering last week.

     

    EF IS FRUSTRATING!!

     

    And I'm sick...I need some echinacea and some rest...

    Wednesday, September 24, 2008 1:56 AM
  • I feel your pain. To join the club I also have a rather simple example which produces poor SQL.

    Assume we have the following tables/entities (this is a 1:1 mapped model):

    Region, Country <1 -- *> Address <1 -- * > Person.

    A person can have multiple addresses in a fixed number of roles, each role has it's own foreign key to the Address table, i.e. ShippingAddressID, BillingAddressID, etc.

    If I query the model like this:

    Code Snippet

    from p in db.Person.Include("ShippingAddress").Include("BillingAddress")

    where p.ID==1

     

     

    Then the provider generates a distinct extent for each foreign key of the address records i.e. an extent provides the data fields for Address, a different extent provides RegionID for the Address and a different provides CountryID. That's three extents where one would be enough.

    It's not clear for me what is the logic between the EF provider about whether to generate theese extra joins or not:

    - If I only include one address, then EF doesn't generate the extra extents.

    - If I use the query above, then EF generates the extra extents.

    - If I use a NoTracking query, then EF doesn't generate the extra extents.

    - If I use a NoTracking query AND I also Include() Country/Region data, then EF does generate the extra extents.

     

    Also if I change the Address-Person relation to <1 -- 0..1> in the model (the business logic enforces this even though there's 1--* in the model because I have no need of the affected navigation property), then the generated SQL is poor with a NoTracking query - 6 joins vs the optimal 2 -, and supergross with a tracking query, with a total of 24 joins, most of them using the very same join predicate, and lots and lots of duplicate fields. Even a simple query like this could cause a huge performance impact. I even considered using multiple roundtrips by manually loading the relations to bring all data from the server I need, since even that performs much better.

    What justifies all this mess? This is a very simple query yet in the last example (with 24 joins) the generated SQL - dumped to a non-unicode text file - is 22k!

    Wednesday, September 24, 2008 6:22 PM
  • Perhapse an inside look at how EF uses these complex SQL statements is needed. I'm assuming there is a reason for their complexity, but perhapse there is a better way to accomplish the same thing. I'm seeing a lot of queries generated from my models that do the nested sequences of LEFT JOINS on 1=1, stuffed inside an EXISTS check. I don't understand why these are necessary, or why the same set of LEFT JOINS is repeated multiple times (sometimes you will see the same sets of LEFT JOINS in an EXISTS check repeated three, four, sometimes five times).

    Wednesday, September 24, 2008 10:40 PM