none
"Same" LINQ query resulting in two different SQL statements RRS feed

  • Question

  • Hi *,
    I have a simple Books - Authors model (one Author can have many Books, one Book has one Author). From model it results into:
    -- --------------------------------------------------
    -- Creating all tables
    -- --------------------------------------------------
    
    -- Creating table 'Authors'
    CREATE TABLE [dbo].[Authors] (
      [ID] int IDENTITY(1,1) NOT NULL,
      [FirstName] nvarchar(max) NOT NULL,
      [LastName] nvarchar(max) NOT NULL
    );
    GO
    
    -- Creating table 'Books'
    CREATE TABLE [dbo].[Books] (
      [ID] int IDENTITY(1,1) NOT NULL,
      [Title] nvarchar(max) NOT NULL,
      [AuthorID] int NOT NULL
    );
    GO
    
    -- --------------------------------------------------
    -- Creating all PRIMARY KEY constraints
    -- --------------------------------------------------
    
    -- Creating primary key on [ID] in table 'Authors'
    ALTER TABLE [dbo].[Authors]
    ADD CONSTRAINT [PK_Authors]
      PRIMARY KEY CLUSTERED ([ID] ASC);
    GO
    
    -- Creating primary key on [ID] in table 'Books'
    ALTER TABLE [dbo].[Books]
    ADD CONSTRAINT [PK_Books]
      PRIMARY KEY CLUSTERED ([ID] ASC);
    GO
    
    -- --------------------------------------------------
    -- Creating all FOREIGN KEY constraints
    -- --------------------------------------------------
    
    -- Creating foreign key on [AuthorID] in table 'Books'
    ALTER TABLE [dbo].[Books]
    ADD CONSTRAINT [FK_AuthorBook]
      FOREIGN KEY ([AuthorID])
      REFERENCES [dbo].[Authors]
        ([ID])
      ON DELETE NO ACTION ON UPDATE NO ACTION;
    
    -- Creating non-clustered index for FOREIGN KEY 'FK_AuthorBook'
    CREATE INDEX [IX_FK_AuthorBook]
    ON [dbo].[Books]
      ([AuthorID]);
    GO
    
    
    

    So far so good. But the query:
    context.Books.Where(b => b.Author.FirstName.Contains(s) || b.Author.LastName.Contains(s)).Select(b => b.Title)
    

    Creates different query than:
    context.Authors.Where(a => a.FirstName.Contains(s) || a.LastName.Contains(s)).SelectMany(a => a.Books).Select(b => b.Title)
    
    or
    context.Authors.Where(a => a.FirstName.Contains(s) || a.LastName.Contains(s)).SelectMany(a => a.Books.Select(b => b.Title)
    

    .
    As far as I can see, the queries are some. But the first one creates another join for every part of the condition. The second or third one are fine and "correct". Is there a reason for it? Or am I missing some difference?

    Jiri {x2} Cincura
    Tuesday, July 19, 2011 4:47 PM

Answers

  • I believe this because of navigation properties you are using inside the Where method in the first query. When you call b.Author there is a join required to find all authors that for the given book (b). In the second query inside Where method you are not accessing the other table but you are just filtering results from the "current" table - hence no joins are needed here. The join you see comes from a.Books from the SelectMany() method (again because of the navigation property). EF probably could be smarter about the first query but it is hard to be smart about specific cases like this and not break general cases at the same time. In addition I believe that SQL server is able to optimize the query and avoid joining the same tables twice.

    Pawel

    Wednesday, July 20, 2011 5:58 PM

All replies

  • Hi Jiri,

    Welcome!

    You're right. The frist query will create one LEFT OUT JOIN on Authors table again.

    exec sp_executesql N'SELECT 
    [Extent2].[Title] AS [Title]
    FROM [dbo].[Authors] AS [Extent1]
    INNER JOIN [dbo].[Books] AS [Extent2] ON [Extent1].[ID] = [Extent2].[AuthorID]
    WHERE ([Extent1].[FirstName] LIKE @p__linq__0 ESCAPE N''~'') OR ([Extent1].[LastName] LIKE @p__linq__1 ESCAPE N''~'')',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',@p__linq__0=N'%Alan%',@p__linq__1=N'%Alan%'
    

    The second:

    SELECT 
    [Extent2].[Title] AS [Title]
    FROM [dbo].[Authors] AS [Extent1]
    INNER JOIN [dbo].[Books] AS [Extent2] ON [Extent1].[ID] = [Extent2].[AuthorID]
    WHERE ([Extent1].[FirstName] LIKE @p__linq__0 ESCAPE N''~'') OR ([Extent1].[LastName] LIKE @p__linq__1 ESCAPE N''~'')',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',@p__linq__0=N'%Alan%',@p__linq__1=N'%Alan%'
    

    I think it relates how the provider to compile the expression tree which create by your query sentence.

    Have a nice day.


    Alan Chen[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, July 20, 2011 9:42 AM
    Moderator
  • I believe this because of navigation properties you are using inside the Where method in the first query. When you call b.Author there is a join required to find all authors that for the given book (b). In the second query inside Where method you are not accessing the other table but you are just filtering results from the "current" table - hence no joins are needed here. The join you see comes from a.Books from the SelectMany() method (again because of the navigation property). EF probably could be smarter about the first query but it is hard to be smart about specific cases like this and not break general cases at the same time. In addition I believe that SQL server is able to optimize the query and avoid joining the same tables twice.

    Pawel

    Wednesday, July 20, 2011 5:58 PM
  • Hi *,

    the problem is that these two queries are really simple and (if I'm not missing something) same. Hence the CCT/CQT should be the same, but somehow it's not. I was just wondering what wrong, because I know from personal experience on similar queries it's able to do the optimization well.

    Sadly the optimizer isn't able to remove the additional join, so the resulting query can be quite slow, especially on a lot of conditions. :(


    Jiri {x2} Cincura
    Wednesday, July 20, 2011 7:28 PM