Microsoft Developer Network > Página Inicial dos Fóruns > Transact-SQL > A strange problem with 'select' performance
Fazer uma PerguntaFazer uma Pergunta
 

RespondidoA strange problem with 'select' performance

  • quinta-feira, 23 de abril de 2009 8:13johny quan Medalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     
    My statement is as followed:

    I have a table with 2 million records and do a query with the SQL
    "select * from TBL where fld_1=@value"
    (fld_1 is the primary key and defined as char(8)).

    1)when @value is N'10000009', it takes about 0.5 second.
    2)when @value is '10000009',it takes about 0.1 second.
    3)when @value is string which ends with 9 and length is 8 like N'10000019', N'10000129',it takes about 0.5 second.
    4)when @value is sting which ends with 9 and length is lessthan 8,such as N'10000008',it takes about 0.1 second.
    5)when @value is string which does not end with 9, just like N'10000001', N'10000003', N'100015'.
    ,it takes 0.1 second.
    6)when fld_1 is defined as nchar(8) and @value is N'10000009',it takes about 0.1 second.
    7)when all of the indexes(the default primary index also included) are droped,it takes the same time whatever the @value is.

    In my project a complexible SQL is used and the difference described above is expanded about 30 times.

    Does anyone know the special '9'?

Respostas

  • terça-feira, 28 de abril de 2009 6:16TiborKMVP, ModeradorMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     Respondido
    The two are both instant on my server. The first time I execute one fo the queries, it is of course slower since the data isn't in cache, but subsequent are instant (non-measurable). This is confirmed bu SET STATISTICS IO ON, both show 3 logical reads. Both return one row, and both show in execution plan both estimated as well as actual number of rows to be 1.

    But, hang on, Profiler show a marginal difference between the two. The "9" gives me consistently duration of 10-12 microseconds, where the "8" version gives me consistently a duration of 0-1 microseconds. Sometimes, the "9" version show a cpu usage of 15-16. other times it show 0, there the "8" version always show cpu usage of 0. Interesting... I saved the actual execution plan for both to xml file and used WinDiff to compare - identical. All I can say is that it is type related, there is some overhead in the execution engine for the "9" version. My guess is that the contant scan is more costly for the "9" versio because the list of possible values is bigger. Just a guess, though.

    This is an excellent example why we should always match types. If you just make the type for the literal to match the column type, you will not see this problem, as in below:
    select * from T_TEST_9 where ID='10000009'
    select * from T_TEST_9 where ID='10000008' 

    Both above are equally fast, and you will also see a simpler execution plan.
    Tibor Karaszi
    • Marcado como Respostajohny quan terça-feira, 28 de abril de 2009 7:14
    •  

Todas as Respostas

  • quinta-feira, 23 de abril de 2009 11:05TiborKMVP, ModeradorMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     Resposta Proposta
    Probably two things causes your differences:

    Uniqcode vs non-unicode. Make sure you match the literal to the datatype for the column in the table. Else a data type conversion has to be performed. Which side will be converted is decided by the "datatype precedence" rules (documented in Books Online). If you are unlucky, the columns side will be converted to the literal side (if the literal side has higher precedence), with more coltly query as result. This will show up in the execution plan, if you compare the two.

    Different selectivity estimated by the optimizer, resulting in different execution plans. This you can also see in the execution plan.
    Tibor Karaszi
    • Sugerido como RespostaMengchew0113 quinta-feira, 23 de abril de 2009 16:18
    •  
  • sexta-feira, 24 de abril de 2009 2:27johny quan Medalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     

    But why the '9' records take much more time than other records when using unicode?

  • sexta-feira, 24 de abril de 2009 6:13TiborKMVP, ModeradorMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     

    Probably because different either actual or estimated selectivity (number of rows returned) - possibly resulting in different execution plans. Did you check the two estimated/actual number of rows and compare execution plans.


    Tibor Karaszi
  • sexta-feira, 24 de abril de 2009 6:23PesoMVPMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     
  • sexta-feira, 24 de abril de 2009 9:47Xinyu Wang - MSFTMSFTMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     
    Can you first verify whether the execution plans are different or not?
    This posting is provided "AS IS" with no warranties, and confers no rights.
  • segunda-feira, 27 de abril de 2009 1:52johny quan Medalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     

    The '9' records and non-9 records have the completely same execution plans:(

  • segunda-feira, 27 de abril de 2009 5:37TiborKMVP, ModeradorMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     
    The column in the table you said is char(8), right?

    I still think the issue is whether you get a type converstion and on what side. Remember ideal is to not have conversion at all. If we do end up with conversion, then we want it on the literal (string value, right) side, not the column side. Type conversion on column side is *bad*. If you show us two values you search against with same type (numbers of letters) and only different values? Also, do these return same number of rows? And do the execution plans estimate same number of rows? The letter 9, vs 7, vs something should not make the difference here unless they make a difference in estimated or actual selecivity.
    Tibor Karaszi
  • terça-feira, 28 de abril de 2009 2:21johny quan Medalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     

    It looks like that attachment is not permitted. You can create a table to do the test using the script below.
    And examine the results in person. Thanks for everyone's help.

    ===========================================================
    CREATE TABLE T_TEST_9 (
    ID CHAR(8) NOT NULL,
    MODDATE DATETIME NULL)

    GO

    ALTER TABLE T_TEST_9
    ADD CONSTRAINT T_TEST_9_PK PRIMARY KEY CLUSTERED (ID)

    Go

    DECLARE @col_value AS CHAR(8)
    DECLARE @int_value AS INT
    SET @int_value = 10000001
    WHILE @int_value < 13000000  --the count of records
    BEGIN
     set @col_value = convert(CHAR(8),@int_value)
     INSERT INTO T_TEST_9
      (ID)
     VALUES
      (@col_value)
     set @int_value = @int_value + 1
    END

    GO
    ============================================================

    "  select * from T_TEST_9 where ID=N'10000009'  "
    takes more time than
    "  select * from T_TEST_9 where ID=N'10000008'  "

  • terça-feira, 28 de abril de 2009 6:16TiborKMVP, ModeradorMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     Respondido
    The two are both instant on my server. The first time I execute one fo the queries, it is of course slower since the data isn't in cache, but subsequent are instant (non-measurable). This is confirmed bu SET STATISTICS IO ON, both show 3 logical reads. Both return one row, and both show in execution plan both estimated as well as actual number of rows to be 1.

    But, hang on, Profiler show a marginal difference between the two. The "9" gives me consistently duration of 10-12 microseconds, where the "8" version gives me consistently a duration of 0-1 microseconds. Sometimes, the "9" version show a cpu usage of 15-16. other times it show 0, there the "8" version always show cpu usage of 0. Interesting... I saved the actual execution plan for both to xml file and used WinDiff to compare - identical. All I can say is that it is type related, there is some overhead in the execution engine for the "9" version. My guess is that the contant scan is more costly for the "9" versio because the list of possible values is bigger. Just a guess, though.

    This is an excellent example why we should always match types. If you just make the type for the literal to match the column type, you will not see this problem, as in below:
    select * from T_TEST_9 where ID='10000009'
    select * from T_TEST_9 where ID='10000008' 

    Both above are equally fast, and you will also see a simpler execution plan.
    Tibor Karaszi
    • Marcado como Respostajohny quan terça-feira, 28 de abril de 2009 7:14
    •  
  • terça-feira, 28 de abril de 2009 7:19johny quan Medalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     
    In addition, the records ending with 'z' take even more time than '9' records.

    It looks like that we can only avoid this problem by correct coding.
  • terça-feira, 28 de abril de 2009 7:34TiborKMVP, ModeradorMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuárioMedalhas de usuário
     
    Aside: I had the scale incorrect for duration. My PRofiler GUI was configured to show duration as milliseconds, not microseconds. When I coinfigured my Profiler app to show as microsends I see about 9000 for the "9" version about 0 or 1000 for the "8" version.
    Tibor Karaszi