none
A strange problem with 'select' performance

    Question

  • My statement is as followed:

    I have a table with 2 million records and do a query with the SQL
    "select * from TBL where fld_1=@value"
    (fld_1 is the primary key and defined as char(8)).

    1)when @value is N'10000009', it takes about 0.5 second.
    2)when @value is '10000009',it takes about 0.1 second.
    3)when @value is string which ends with 9 and length is 8 like N'10000019', N'10000129',it takes about 0.5 second.
    4)when @value is sting which ends with 9 and length is lessthan 8,such as N'10000008',it takes about 0.1 second.
    5)when @value is string which does not end with 9, just like N'10000001', N'10000003', N'100015'.
    ,it takes 0.1 second.
    6)when fld_1 is defined as nchar(8) and @value is N'10000009',it takes about 0.1 second.
    7)when all of the indexes(the default primary index also included) are droped,it takes the same time whatever the @value is.

    In my project a complexible SQL is used and the difference described above is expanded about 30 times.

    Does anyone know the special '9'?

    Thursday, April 23, 2009 8:13 AM

Answers

  • The two are both instant on my server. The first time I execute one fo the queries, it is of course slower since the data isn't in cache, but subsequent are instant (non-measurable). This is confirmed bu SET STATISTICS IO ON, both show 3 logical reads. Both return one row, and both show in execution plan both estimated as well as actual number of rows to be 1.

    But, hang on, Profiler show a marginal difference between the two. The "9" gives me consistently duration of 10-12 microseconds, where the "8" version gives me consistently a duration of 0-1 microseconds. Sometimes, the "9" version show a cpu usage of 15-16. other times it show 0, there the "8" version always show cpu usage of 0. Interesting... I saved the actual execution plan for both to xml file and used WinDiff to compare - identical. All I can say is that it is type related, there is some overhead in the execution engine for the "9" version. My guess is that the contant scan is more costly for the "9" versio because the list of possible values is bigger. Just a guess, though.

    This is an excellent example why we should always match types. If you just make the type for the literal to match the column type, you will not see this problem, as in below:
    select * from T_TEST_9 where ID='10000009'
    select * from T_TEST_9 where ID='10000008' 

    Both above are equally fast, and you will also see a simpler execution plan.
    Tibor Karaszi
    • Marked as answer by johny quan Tuesday, April 28, 2009 7:14 AM
    Tuesday, April 28, 2009 6:16 AM

All replies

  • Probably two things causes your differences:

    Uniqcode vs non-unicode. Make sure you match the literal to the datatype for the column in the table. Else a data type conversion has to be performed. Which side will be converted is decided by the "datatype precedence" rules (documented in Books Online). If you are unlucky, the columns side will be converted to the literal side (if the literal side has higher precedence), with more coltly query as result. This will show up in the execution plan, if you compare the two.

    Different selectivity estimated by the optimizer, resulting in different execution plans. This you can also see in the execution plan.
    Tibor Karaszi
    • Proposed as answer by Mengchew0113 Thursday, April 23, 2009 4:18 PM
    Thursday, April 23, 2009 11:05 AM
  • But why the '9' records take much more time than other records when using unicode?

    Friday, April 24, 2009 2:27 AM
  • Probably because different either actual or estimated selectivity (number of rows returned) - possibly resulting in different execution plans. Did you check the two estimated/actual number of rows and compare execution plans.


    Tibor Karaszi
    Friday, April 24, 2009 6:13 AM
  • Friday, April 24, 2009 6:23 AM
  • Can you first verify whether the execution plans are different or not?
    This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, April 24, 2009 9:47 AM
  • The '9' records and non-9 records have the completely same execution plans:(

    Monday, April 27, 2009 1:52 AM
  • The column in the table you said is char(8), right?

    I still think the issue is whether you get a type converstion and on what side. Remember ideal is to not have conversion at all. If we do end up with conversion, then we want it on the literal (string value, right) side, not the column side. Type conversion on column side is *bad*. If you show us two values you search against with same type (numbers of letters) and only different values? Also, do these return same number of rows? And do the execution plans estimate same number of rows? The letter 9, vs 7, vs something should not make the difference here unless they make a difference in estimated or actual selecivity.
    Tibor Karaszi
    Monday, April 27, 2009 5:37 AM
  • It looks like that attachment is not permitted. You can create a table to do the test using the script below.
    And examine the results in person. Thanks for everyone's help.

    ===========================================================
    CREATE TABLE T_TEST_9 (
    ID CHAR(8) NOT NULL,
    MODDATE DATETIME NULL)

    GO

    ALTER TABLE T_TEST_9
    ADD CONSTRAINT T_TEST_9_PK PRIMARY KEY CLUSTERED (ID)

    Go

    DECLARE @col_value AS CHAR(8)
    DECLARE @int_value AS INT
    SET @int_value = 10000001
    WHILE @int_value < 13000000  --the count of records
    BEGIN
     set @col_value = convert(CHAR(8),@int_value)
     INSERT INTO T_TEST_9
      (ID)
     VALUES
      (@col_value)
     set @int_value = @int_value + 1
    END

    GO
    ============================================================

    "  select * from T_TEST_9 where ID=N'10000009'  "
    takes more time than
    "  select * from T_TEST_9 where ID=N'10000008'  "

    Tuesday, April 28, 2009 2:21 AM
  • The two are both instant on my server. The first time I execute one fo the queries, it is of course slower since the data isn't in cache, but subsequent are instant (non-measurable). This is confirmed bu SET STATISTICS IO ON, both show 3 logical reads. Both return one row, and both show in execution plan both estimated as well as actual number of rows to be 1.

    But, hang on, Profiler show a marginal difference between the two. The "9" gives me consistently duration of 10-12 microseconds, where the "8" version gives me consistently a duration of 0-1 microseconds. Sometimes, the "9" version show a cpu usage of 15-16. other times it show 0, there the "8" version always show cpu usage of 0. Interesting... I saved the actual execution plan for both to xml file and used WinDiff to compare - identical. All I can say is that it is type related, there is some overhead in the execution engine for the "9" version. My guess is that the contant scan is more costly for the "9" versio because the list of possible values is bigger. Just a guess, though.

    This is an excellent example why we should always match types. If you just make the type for the literal to match the column type, you will not see this problem, as in below:
    select * from T_TEST_9 where ID='10000009'
    select * from T_TEST_9 where ID='10000008' 

    Both above are equally fast, and you will also see a simpler execution plan.
    Tibor Karaszi
    • Marked as answer by johny quan Tuesday, April 28, 2009 7:14 AM
    Tuesday, April 28, 2009 6:16 AM
  • In addition, the records ending with 'z' take even more time than '9' records.

    It looks like that we can only avoid this problem by correct coding.
    Tuesday, April 28, 2009 7:19 AM
  • Aside: I had the scale incorrect for duration. My PRofiler GUI was configured to show duration as milliseconds, not microseconds. When I coinfigured my Profiler app to show as microsends I see about 9000 for the "9" version about 0 or 1000 for the "8" version.
    Tibor Karaszi
    Tuesday, April 28, 2009 7:34 AM