MSDN > Home page del forum > LINQ to SQL > LINQ + .Contains(s) + sp_executesql = big performance problem
Formula una domandaFormula una domanda
 

Con rispostaLINQ + .Contains(s) + sp_executesql = big performance problem

  • venerdì 7 novembre 2008 15.02Evaldas Jocys Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    I've discovered that LINQ auto-generated SQL search commands performs 3-4 times slower than manually created. And performance is much worse with bigger tables (2M+ records).

    I moved fast query on top. Cleaned plan cache, updated statistics but LINQ generated query still performs 4*N times slower. I’ve tested this on 5 different SQL servers (i386, x64, v9.0.3282 SP2 CU9) and results always the same.

    It looks like problem lies in SQL server and not in LINQ. But if it affects all Microsoft databases worldwide then its a bummer. At this moment I don't know any workaround for this problem. If you have any info or workaround for this issue then please post it here.

    Note: Table/Index must have at least 2 000 000 records to see huge difference, my guess, it must not fit (be cached) into the RAM). I guess LINQ users with big databases are screwed - no FullText support and ‘LIKE’ is slow as *** Smile.


    Test Code:

    -- Make sure we have up-to-date statistics
    UPDATE STATISTICS dbo.TestTable1 WITH FULLSCAN

    -- Remove all cached query plans and
    -- execution contexts from the plan cache.
    DBCC FREEPROCCACHE

    -- Custom query, no parameters: Takes 6 seconds. Use Index/Table Scan.
    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WHERE [Name] LIKE ''%pattern%'''

    -- Make sure we have up-to-date statistics
    UPDATE STATISTICS dbo.TestTable1 WITH FULLSCAN
    -- Remove all cached query plans and
    -- execution contexts from the plan cache.
    DBCC FREEPROCCACHE

    -- Query generated by LINQ (row.Name.Contains("pattern")) Takes 20 seconds. Use Index Seek.
    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WHERE [Name] LIKE @p0', N'@p0 nvarchar(9)', @p0='%pattern%'


    Code to create and fill TestTable1 table:


    -- Drop test table.
    DROP TABLE [TestTable1]
    -- Columns
    CREATE TABLE [dbo].[TestTable1]
    (
          [Name] [nvarchar] (100) COLLATE Latin1_General_CI_AS NOT NULL
    ) ON [PRIMARY]
    -- Constraints and indexes
    CREATE NONCLUSTERED INDEX [IX_TestTable1] ON [dbo].[TestTable1] ([Name]) ON [PRIMARY]

    -- Insert 100 records into temp table.
    DECLARE @Table AS TABLE (Id nvarchar(36) PRIMARY KEY)
    INSERT INTO @Table SELECT newid() UNION SELECT newid() UNION SELECT newid() UNION SELECT newid()
    INSERT INTO @Table SELECT TOP 96 newid() FROM @Table AS t0
    CROSS JOIN @Table AS t1 CROSS JOIN @Table AS t2 CROSS JOIN @Table AS t3
    -- Generate  2 000 000 records (8 min).
    INSERT INTO dbo.TestTable1 SELECT t0.Id+t1.Id FROM @Table AS t0
    CROSS JOIN @Table AS t1 CROSS JOIN @Table AS t2
    INSERT INTO dbo.TestTable1 SELECT t0.Id+t1.Id FROM @Table AS t0
    CROSS JOIN @Table AS t1 CROSS JOIN @Table AS t2
    SELECT Count(*) FROM dbo.TestTable1



    Evaldas Jocys
    http://www.jocys.com/

Risposte

  • lunedì 10 novembre 2008 11.45KristoferA - Huagati SystemsPostatoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Con risposta

     Evaldas Jocys wrote:
    At this moment I have two incomplete solutions because both require extension of current LINQ to SQL classes:

    A) Pass option WITH (INDEX = 0) to LINQ query (seems like there is no way to do that):
    (Submited by  Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se)
    This will force SQL server to use index scan so search will be multiple times faster.

    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WITH (INDEX = 0) WHERE [Name] LIKE @p0', N'@p0 nvarchar(9)', @p0='%pattern%'

     

    If I'm not mistaken, with (index=0) forces a table scan, not an index scan. If "name" was the primary clustered key this would effectively be the same but it is not in the example provided.

     

     

     Evaldas Jocys wrote:

    B) Use SQL Full-Text extension to LINQ to SQL classes which doesn't exist. If extension exist then I can't find it on Google. SQL Full-Text CONTAINS(s) would be enough.

    All I can do now is to extend command execution time by using db.CommandTimeout = 120 when using search. It doesn't solve the performance problem but at least it it won't throw timeout error when using search.

     

    As I mentioned earlier, depending on what you store in the "Name" column this can be optimized using existing functionality. I'm assuming your real world usage of this is not 100x100 concatenated guids. If for example you're storing names, split it up into last/first/middle. If something else, create a child/lookup table with the data broken up into smaller units, e.g. sentence -> words.

     

    If you for some reason can not do any table layout changes to the database and/or if fulltext search is what you really need, encapsulate the fulltext lookup in a stored procedure that return PK values for the table you're searching in and then call that stored proc from L2S.    (But if you're storing/searching something like person names, then fulltext is overkill/unnecessary. Also don't forget that fulltext is not realtime in-sync with the db contents in the same way real db indexes are).

Tutte le risposte

  • lunedì 10 novembre 2008 3.10KristoferA - Huagati SystemsPostatoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     

    The execution plan for the parameterized (stored proc) version is generated without the optimizer knowing what you're going to pass in as @p0. The same will happen if you create a stored procedure with the same query. Because of this, the optimizer decides to opt for an index-seek based execution plan - this will make most sense unless there is a leading wildcard in the lookup criteria (@p0) so the SQL optimizer makes the assumption that most of the time it will be used for "small" lookups.

     

    The seek-instead-of-scan could be avoided with a SQL optimizer hint but unfortunately there is no built-in way to pass optimizer hints to the db in Linq queries today. (Although that can be worked around with an extension method to IQueryable if you want to go down that route.)

     

    ---

     

    A couple of general observations with regards to query performance on your sample table:

     

    1) Do you really need to use like '%pattern%'  (.Contains) on a large table/large set of data? Or would like 'pattern%' (.StartsWith) do?

     

    If you need to match something inside a string, you should reconsider your table design and maybe have a separate lookup table (or separate columns in your table) where you break up the "name" column so you can do like 'pattern%' matches. Matching with both leading and trailing wild cards means that the entire index need to be scanned (40k reads in your first query), and every index entry need to be scanned to find if the pattern exists anywhere in the column whereas for example like 'pattern%' would only scan or seek only the relevant portion of the index.

     

    The non-sp-parameterized version of the query against the 2M row test table still does 40k reads due to the full index scan. The parameterized one does 55k reads due to the seek so it adds a little bit of I/O overhead but both are still way too expensive I/O wise if this is just a simple name lookup. In contrast, removing the leading wild card brings both queries down to 5/525 reads (5 when there is no match, 525 when there is a match).

     

    If the "Name" column in your table will store a person name, consider breaking it up into separate indexed columns first, last, middle names etc.

     

    2) Since your test table does not have a primary key or any other column than the "Name" column, why not cluster it on the "Name" column? If always doing lookups against this column only you may as well have the data stored in order. Also consider adding a primary key + a clustered index (either on the PK or the "Name" column) to avoid the internal auto-generated heap pk / "funny" storage order.

     

    3) Does the spread/content of values in the generated sample data represent the distribution of values in the real world table you're working with? This particular example won't be affected but other queries can be negatively affected by a low spread of values, so when doing this kind of tests make sure the sample/test data you're using resembles the real world data you will be working with as closely as possible.

  • lunedì 10 novembre 2008 3.15KristoferA - Huagati SystemsPostatoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     

    I/O stats:

     

    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WHERE [Name] LIKE ''%pattern%'''

     

    Table 'TestTable1'. Scan count 1, logical reads 39216, physical reads 0, read-ahead reads 39061, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

     

    ---

     

    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WHERE [Name] LIKE @p0', N'@p0 nvarchar(9)', @p0='%pattern%'

     

    Table 'TestTable1'. Scan count 1, logical reads 54912, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

     

    ---

     

    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WHERE [Name] LIKE @p0', N'@p0 nvarchar(9)', @p0='pattern%'

     

    Table 'TestTable1'. Scan count 1, logical reads 5, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  • lunedì 10 novembre 2008 11.26Evaldas Jocys Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    At this moment I have two incomplete solutions because both require extension of current LINQ to SQL classes:

    A) Pass option WITH (INDEX = 0) to LINQ query (seems like there is no way to do that):
    (Submited by  Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se)
    This will force SQL server to use index scan so search will be multiple times faster.

    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WITH (INDEX = 0) WHERE [Name] LIKE @p0', N'@p0 nvarchar(9)', @p0='%pattern%'

    B) Use SQL Full-Text extension to LINQ to SQL classes
    which doesn't exist. If extension exist then I can't find it on Google. SQL Full-Text CONTAINS(s) would be enough.

    All I can do now is to extend command execution time by using db.CommandTimeout = 120 when using search. It doesn't solve the performance problem but at least it it won't throw timeout error when using search.



    P.S.: I've also found some interesting examples of
    LLBLGen:
    http://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=13677

    ''' Class which defines the custom mapping between ServiceCenterFunctions.FullTextSearch and CONTAINS
    Public Class MyCustomFunctionMappings Inherits FunctionMappingStore
        Public Sub New()
            Me.Add(New FunctionMapping(GetType(MyCustomFunctions), "FullTextSearch", 2, "CONTAINS({0}, {1})"))
        End Sub
    End Class
    ...
    Dim q1 = From p In GetProjects() Where MyCustomFunctions.FullTextSearch(p.Description, searchTerm) Select p.Id


    But this involves third party library. Simpler Microsoft's LINQ to SQL extension would be much better solution.


    Evaldas Jocys
    http://www.jocys.com/
  • lunedì 10 novembre 2008 11.45KristoferA - Huagati SystemsPostatoreMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     Con risposta

     Evaldas Jocys wrote:
    At this moment I have two incomplete solutions because both require extension of current LINQ to SQL classes:

    A) Pass option WITH (INDEX = 0) to LINQ query (seems like there is no way to do that):
    (Submited by  Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se)
    This will force SQL server to use index scan so search will be multiple times faster.

    exec sp_executesql N'SELECT Count(*) FROM dbo.TestTable1 WITH (INDEX = 0) WHERE [Name] LIKE @p0', N'@p0 nvarchar(9)', @p0='%pattern%'

     

    If I'm not mistaken, with (index=0) forces a table scan, not an index scan. If "name" was the primary clustered key this would effectively be the same but it is not in the example provided.

     

     

     Evaldas Jocys wrote:

    B) Use SQL Full-Text extension to LINQ to SQL classes which doesn't exist. If extension exist then I can't find it on Google. SQL Full-Text CONTAINS(s) would be enough.

    All I can do now is to extend command execution time by using db.CommandTimeout = 120 when using search. It doesn't solve the performance problem but at least it it won't throw timeout error when using search.

     

    As I mentioned earlier, depending on what you store in the "Name" column this can be optimized using existing functionality. I'm assuming your real world usage of this is not 100x100 concatenated guids. If for example you're storing names, split it up into last/first/middle. If something else, create a child/lookup table with the data broken up into smaller units, e.g. sentence -> words.

     

    If you for some reason can not do any table layout changes to the database and/or if fulltext search is what you really need, encapsulate the fulltext lookup in a stored procedure that return PK values for the table you're searching in and then call that stored proc from L2S.    (But if you're storing/searching something like person names, then fulltext is overkill/unnecessary. Also don't forget that fulltext is not realtime in-sync with the db contents in the same way real db indexes are).

  • lunedì 10 novembre 2008 13.29Evaldas Jocys Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
     KristoferA wrote:
    If I'm not mistaken, with (index=0) forces a table scan, not an index scan. If "name" was the primary clustered key this would effectively be the same but it is not in the example provided.
    You are right, its table scan (it runs faster than index seek).
     KristoferA wrote:
    If you for some reason can not do any table layout changes to the database and/or if fulltext search is what you really need, encapsulate the fulltext lookup in a stored procedure that return PK values for the table you're searching in and then call that stored proc from L2S.
    This can solve problem. Unfortunately use of stored procedures defeats usefulness of LINQ because any SQL condition (WHERE, ORDER BY, ...) applied on Procedure results will be executed on client (C#) side and not on SQL server. I also want do search (.Where<>(x)) on LINQ queries defined before.
     KristoferA wrote:
    Also don't forget that fulltext is not realtime in-sync with the db contents in the same way real db indexes are
    I could live with that because Full-Text index would be extremely usefull extension. Smile