none
clustered index not being used

    Question

  • Hi
    I have a very simple yet strange query plan. I am using SQL Server 2008 R2 (no SP1)


    My table index is like this:

    index_name index_description index_keys
    IDX_Session_Key_201010 nonclustered located on PRIMARY Session_Key
    IDX_U_Tracking_Key_201010 nonclustered, unique located on PRIMARY Tracking_Key
    PK_Fact_Tracking_201010 clustered, unique, primary key located on PRIMARY Tracking_Date, Site_Country_Key, Row_ID

    My query is the following:

     
     declare @Date_From_Working date = '17 Jul 2011'
    declare @Date_To_Working date = '19 Jul 2011'


    SELECT
     'Tracking' AS Fact_Name
    , Tracking_date
    , Site_Country_Key
    , COUNT(*) AS Record_Count
     FROM
     Fact_Tracking_201107 td
    WHERE
     Tracking_date BETWEEN @Date_From_Working AND @Date_To_Working GROUP BY
     Tracking_date
    , Site_Country_Key

     

    The problem is that the index used in the query is not the one with the Date first but the IDX_Session_Key_201010. It is still surpisingly quick but not that quick.


    Now if I change the query so the Tracking_date BETWEEN '17 Jul 2011'  AND '19 Jul 2011' then it uses the clustered index (aware that sometimes helps if you don't pass a variable). I am actually quite surprised by this as SQL Server normally gets this right. The tracking_date is DATE but also tried DATETIME and still the same. My problem is the Fact_Tracking_201107 is used in a view, 12 tables sum up  the year so I can't force the index as I have to use the view in the proper query, but I think if I figure out why the more basic query doesn't work may be able to undertand why it doesn't work with the view. The Fact_Tracking_201107 has a few million rows, the

    total for the year is around 500M rows.Does anyone know what's wrong? Thanks. Panos

     

    Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
    Tracking_Date date no 3 10    0     no (n/a) (n/a) NULL
    Tracking_Date_Key int no 4 10    0     no (n/a) (n/a) NULL
    Tracking_Time time no 3 8     0     no (n/a) (n/a) NULL
    Tracking_Time_Key int no 4 10    0     no (n/a) (n/a) NULL
    Session_ID int no 4 10    0     no (n/a) (n/a) NULL
    Row_ID int no 4 10    0     no (n/a) (n/a) NULL
    Next_Page_Ptr int no 4 10    0     yes (n/a) (n/a) NULL
    Cat_ID int no 4 10    0     yes (n/a) (n/a) NULL
    Category_Key int no 4 10    0     no (n/a) (n/a) NULL
    Partner_ID int no 4 10    0     yes (n/a) (n/a) NULL
    Partner_Key int no 4 10    0     no (n/a) (n/a) NULL
    PID int no 4 10    0     yes (n/a) (n/a) NULL
    Site_Country_Key int no 4 10    0     no (n/a) (n/a) NULL
    URL nvarchar no 8000             yes (n/a) (n/a) Latin1_General_CI_AS
    Referrer nvarchar no 8000             yes (n/a) (n/a) Latin1_General_CI_AS
    Domain varchar no 1024             yes no yes Latin1_General_CI_AS
    Page_Type varchar no 1024             yes no yes Latin1_General_CI_AS
    UX_Page_Type varchar no 1024             yes no yes Latin1_General_CI_AS
    Page_Type_Key int no 4 10    0     no (n/a) (n/a) NULL
    Kelkoo_ID varchar no 32             yes no yes Latin1_General_CI_AS
    IP_Address varchar no 1000             yes no yes Latin1_General_CI_AS
    Keyword_Partner_ID varchar no 1024             yes no yes Latin1_General_CI_AS
    KID_Status varchar no 1024             yes no yes Latin1_General_CI_AS
    Bucket_ID tinyint no 1 3     0     yes (n/a) (n/a) NULL
    IP_Address_Numeric bigint no 8 19    0     yes (n/a) (n/a) NULL
    Audit_Key int no 4 10    0     no (n/a) (n/a) NULL
    Session_Key bigint no 8 19    0     yes (n/a) (n/a) NULL
    Tracking_Key bigint no 8 19    0     yes (n/a) (n/a) NULL
    Platform varchar no 50             yes no yes Latin1_General_CI_AS
    Page_Number bigint no 8 19    0     yes (n/a) (n/a) NULL

     

    Monday, December 5, 2011 10:51 AM

Answers

  • If SQL Server thinks that the range in your query is only a small portion of the rows, it would use the clustered index since you are doing a range query on Tracking_date and the first column in your clustered index is Tracking_date and non of your nonclustered indexes have tracking date as the first column in the index.

    But if SQL Server believes that the range given will result in enough rows being included in the range, then it is going to use one of the nonclustered indexes.  This will happen because if SQL uses either of the nonclustered indexes, it can do that query without ever going to the clustered index.  That's because every column needed to satisfy your query is a key of the clustered index.  And all of the keys in the clustered index are also in the leaf level of every nonclustered index.  So SQL's choice is either to scan the whole nonclustered index or a range of the clustered index.  So if the range to scan of the clustered index is small, then the clustered index is a better choice.  But if it is large, the nonclustered index index is a better choice.  This is because even though scanning the whole nonclustered index means SQL reads more rows in the nonclustered index than it would in scanning a range of the clustered index, each row in the nonclustered index is much smaller (because it contains only the columns in that index + the columns in the clustered index key), so in that case SQL will use the smallest nonclustered index since that will result in the least amount of logical I/O's.

    Tom

    Monday, December 12, 2011 7:08 PM

All replies

  • Hi,


    I have a very simple yet strange query plan. I am using 2008 R2 (no SP1)


    My table index is like this:

     

    index_name index_description index_keys
    IDX_Session_Key_201010 nonclustered located on PRIMARY Session_Key
    IDX_U_Tracking_Key_201010 nonclustered, unique located on PRIMARY Tracking_Key
    PK_Fact_Tracking_201010 clustered, unique, primary key located on PRIMARY Tracking_Date, Site_Country_Key, Row_ID

    My query is the following:

     


     declare @Date_From_Working date = '17 Jul 2011'
    declare @Date_To_Working date = '19 Jul 2011'


    SELECT
     'Tracking' AS Fact_Name
    , Tracking_date
    , Site_Country_Key
    , COUNT(*) AS Record_Count
     FROM
     Fact_Tracking_201107 td
    WHERE
     Tracking_date BETWEEN @Date_From_Working AND @Date_To_Working GROUP BY
     Tracking_date
    , Site_Country_Key

     

    The problem is that the index used in the query is not the one with the Date but the IDX_Session_Key_201010. It is still surpisingly quick but not that quick.


    Now if I change the query so the Tracking_date BETWEEN '17 Jul 2011'  AND '19 Jul 2011' then it uses the clustered index (aware that sometimes helps if you don't pass a variable). I am actually quite surprised by this as SQL Server normally gets this right. I have never used the new datatype DATE but datetime has the same results. My problem is the Fact_Tracking_201107 is used in a view, 12 tables sum up  the year so I can't force the index as I have to use the view in the proper query, but I think if I figure out why the more basic query doesn't work may be able to undertand why it doesn't work with the view. The Fact_Tracking_201107 has a few million rows, the

    total for the year is around 500M rows.Does anyone know what's wrong? Thanks. Panos

     

    Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
    Tracking_Date date no 3 10    0     no (n/a) (n/a) NULL
    Tracking_Date_Key int no 4 10    0     no (n/a) (n/a) NULL
    Tracking_Time time no 3 8     0     no (n/a) (n/a) NULL
    Tracking_Time_Key int no 4 10    0     no (n/a) (n/a) NULL
    Session_ID int no 4 10    0     no (n/a) (n/a) NULL
    Row_ID int no 4 10    0     no (n/a) (n/a) NULL
    Next_Page_Ptr int no 4 10    0     yes (n/a) (n/a) NULL
    Cat_ID int no 4 10    0     yes (n/a) (n/a) NULL
    Category_Key int no 4 10    0     no (n/a) (n/a) NULL
    Partner_ID int no 4 10    0     yes (n/a) (n/a) NULL
    Partner_Key int no 4 10    0     no (n/a) (n/a) NULL
    PID int no 4 10    0     yes (n/a) (n/a) NULL
    Site_Country_Key int no 4 10    0     no (n/a) (n/a) NULL
    URL nvarchar no 8000             yes (n/a) (n/a) Latin1_General_CI_AS
    Referrer nvarchar no 8000             yes (n/a) (n/a) Latin1_General_CI_AS
    Domain varchar no 1024             yes no yes Latin1_General_CI_AS
    Page_Type varchar no 1024             yes no yes Latin1_General_CI_AS
    UX_Page_Type varchar no 1024             yes no yes Latin1_General_CI_AS
    Page_Type_Key int no 4 10    0     no (n/a) (n/a) NULL
    Kelkoo_ID varchar no 32             yes no yes Latin1_General_CI_AS
    IP_Address varchar no 1000             yes no yes Latin1_General_CI_AS
    Keyword_Partner_ID varchar no 1024             yes no yes Latin1_General_CI_AS
    KID_Status varchar no 1024             yes no yes Latin1_General_CI_AS
    Bucket_ID tinyint no 1 3     0     yes (n/a) (n/a) NULL
    IP_Address_Numeric bigint no 8 19    0     yes (n/a) (n/a) NULL
    Audit_Key int no 4 10    0     no (n/a) (n/a) NULL
    Session_Key bigint no 8 19    0     yes (n/a) (n/a) NULL
    Tracking_Key bigint no 8 19    0     yes (n/a) (n/a) NULL
    Platform varchar no 50             yes no yes Latin1_General_CI_AS
    Page_Number bigint no 8 19    0     yes (n/a) (n/a) NULL

     

    Monday, December 5, 2011 10:53 AM
  • That is because SQL Server Optimizer decides that to return the data you are asking for will be more efficient using NCI ....(based on date range--probably small range) 
    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, December 5, 2011 10:54 AM
    Answerer
  • duplicate Post :http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/e796129b-a986-4b0c-8c7d-2bc6920e0cc4

    Thanks and regards, Rishabh , Microsoft Community Contributor
    Monday, December 5, 2011 11:10 AM
  • OK thank you, but the other NCI doesn't even have to do anything with the date field. How is it utilised and why does it work ok  (using the clustered index) when the date field is hard coded and not passed as a variable? It's fast but I think it's slightly faster when using the clustered index.

    Really my problem here is that when I run the same query via a View which comprises or all 12 tables for the year then it still using the same index and it's really slow.

    I've read somewhere that if I do a CAST on the date field when I use the view it may work much faster ok but it's very slow. Any ideas how I can improve the performance when running the query via a view for the full year? I can't do a table hint and can't create an index on the view.

    Thank you.

    Monday, December 5, 2011 12:59 PM
  • The following article deals with query optimization:

    http://www.sqlusa.com/articles/query-optimization/

    For stability make the query into a parameter-sniffing proof stored procedure.

    http://www.sqlusa.com/bestpractices/parameter-sniffing/


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    • Proposed as answer by Naomi NModerator Sunday, December 11, 2011 5:24 AM
    • Marked as answer by KJian_ Monday, December 12, 2011 8:48 AM
    • Unmarked as answer by panlondon Monday, December 12, 2011 2:17 PM
    • Unproposed as answer by Naomi NModerator Monday, December 12, 2011 3:14 PM
    Saturday, December 10, 2011 10:30 PM
    Moderator
  • Hi SQLUSA,

     

    I had a look at the articles and it's all things I am aware of. I don't think my query "suffers" from any of the problems you mentioned. I don't have any dateadd() on the condition and also my variables are re-declared in the sql as you can see. It's still a mystery to me why when I hardode the date instead of using the variable it selects the correct index. It's also strange how using the incorrect index has less logical reads than using the date index (but takes around the same exec time)

    Is there a specific point from the articles you mentioned that I should pay special attention? This is a trivial sql and the only thing that's different is the few million rows in the table, otherwise this sort of statements works as expected.

    Thank you.

    Panos.

    Monday, December 12, 2011 2:17 PM
  • How about

     declare @Date_From_Working date = '17 Jul 2011'
    declare @Date_To_Working date = '19 Jul 2011'


    SELECT
     'Tracking' AS Fact_Name
    , Tracking_date
    , Site_Country_Key
    , COUNT(*) AS Record_Count
     FROM
     Fact_Tracking_201107 td 
    WHERE
     Tracking_date BETWEEN @Date_From_Working AND @Date_To_Working GROUP BY
     Tracking_date
    , Site_Country_Key OPTION(RECOMPILE)


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, December 12, 2011 2:22 PM
    Answerer
  • Uri, still uses the same IDX_U_Tracking_Key_201010 index.

    How is it possible for the query to generate similar number of logical reads and be as fast when it clearly uses the wrong index. I know that internally a NCI will use the clustered index (as opposed to a heap) but it doesn't make sense it can be that fast by using an index field that doesn't appear to have anything to do with the query.

    Thanks,

    Panos.

    Monday, December 12, 2011 2:55 PM
  • You have a clustered index on tracking date? I think you mean non-clustered right?

    In any event, replace the BETWEEN operator with standard comparison operators and see if that helps.


    Already reported as abusive
    Monday, December 12, 2011 3:04 PM
  • You have a composite index on Tracking_Date. You would have a much better chance of usage with a single column index. How selective is the composite index on Tracking_Date?

    Make sure that statistics is updated (always a good idea).

    If this is a business critical query, try covering index:

    http://www.sqlusa.com/bestpractices/coveringindex/

    Another option: New NC index on Tracking_Date. 

    Unless you have an overwhelming reason, INT IDENTITY is the preferred PRIMARY KEY (Surrogate PK).


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM




    Monday, December 12, 2011 4:58 PM
    Moderator
  • If SQL Server thinks that the range in your query is only a small portion of the rows, it would use the clustered index since you are doing a range query on Tracking_date and the first column in your clustered index is Tracking_date and non of your nonclustered indexes have tracking date as the first column in the index.

    But if SQL Server believes that the range given will result in enough rows being included in the range, then it is going to use one of the nonclustered indexes.  This will happen because if SQL uses either of the nonclustered indexes, it can do that query without ever going to the clustered index.  That's because every column needed to satisfy your query is a key of the clustered index.  And all of the keys in the clustered index are also in the leaf level of every nonclustered index.  So SQL's choice is either to scan the whole nonclustered index or a range of the clustered index.  So if the range to scan of the clustered index is small, then the clustered index is a better choice.  But if it is large, the nonclustered index index is a better choice.  This is because even though scanning the whole nonclustered index means SQL reads more rows in the nonclustered index than it would in scanning a range of the clustered index, each row in the nonclustered index is much smaller (because it contains only the columns in that index + the columns in the clustered index key), so in that case SQL will use the smallest nonclustered index since that will result in the least amount of logical I/O's.

    Tom

    Monday, December 12, 2011 7:08 PM
  • If SQL Server thinks that the range in your query is only a small portion of the rows, it would use the clustered index since you are doing a range query on Tracking_date and the first column in your clustered index is Tracking_date and non of your nonclustered indexes have tracking date as the first column in the index.

    But if SQL Server believes that the range given will result in enough rows being included in the range, then it is going to use one of the nonclustered indexes.  This will happen because if SQL uses either of the nonclustered indexes, it can do that query without ever going to the clustered index.  That's because every column needed to satisfy your query is a key of the clustered index.  And all of the keys in the clustered index are also in the leaf level of every nonclustered index.  So SQL's choice is either to scan the whole nonclustered index or a range of the clustered index.  So if the range to scan of the clustered index is small, then the clustered index is a better choice.  But if it is large, the nonclustered index index is a better choice.  This is because even though scanning the whole nonclustered index means SQL reads more rows in the nonclustered index than it would in scanning a range of the clustered index, each row in the nonclustered index is much smaller (because it contains only the columns in that index + the columns in the clustered index key), so in that case SQL will use the smallest nonclustered index since that will result in the least amount of logical I/O's.

    Tom


    Ok Tom,

    I've read this 3 times now and all I got was least amount of logical I/O's but that's one hell of an answer!


    Already reported as abusive
    Monday, December 12, 2011 7:15 PM
  • With your stats updated,Remove the Count(*) from your query and see if it uses the index PK_Fact_Tracking_201010 or not. May be session_key contains the smallest size column in its definition that is why query optimizer is preferring it probably less I/O with some date parameters

    Thanks and regards, Rishabh , Microsoft Community Contributor
    • Edited by Rishabh K Tuesday, December 13, 2011 6:47 AM
    Tuesday, December 13, 2011 4:42 AM
  • "But if SQL Server believes that the range given will result in enough rows being included in the range, then it is going to use one of the nonclustered indexes. This will happen because if SQL uses either of the nonclustered indexes, it can do that query without ever going to the clustered index. That's because every column needed to satisfy your query is a key of the clustered index. And all of the keys in the clustered index are also in the leaf level of every nonclustered index. So SQL's choice is either to scan the whole nonclustered index or a range of the clustered index. So if the range to scan of the clustered index is small, then the clustered index is a better choice. But if it is large, the nonclustered index index is a better choice. This is because even though scanning the whole nonclustered index means SQL reads more rows in the nonclustered index than it would in scanning a range of the clustered index, each row in the nonclustered index is much smaller (because it contains only the columns in that index + the columns in the clustered index key), so in that case SQL will use the smallest nonclustered index since that will result in the least amount of logical I/O's."

     

    This seems the more likely answer. Thanks everyone for answering.

     

    Panos


    • Edited by panlondon Wednesday, December 14, 2011 3:35 PM
    Wednesday, December 14, 2011 3:34 PM
  • Panos,

    Your index is not clustered index!  Your index is clustered composite index!

    There is a big difference.

    Do you really use any shape or form the clustered composite index? Is there any query really taking advantage of it?

    Why don't you use a simple INT IDENTITY column for surrogate PRIMARY KEY?

    Tom,

    Can you restate your nice treatise above for clustered composite index? Thanks.

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM



    Wednesday, December 14, 2011 3:52 PM
    Moderator
  • I suggested that Tom's reply may be the answer. But since last week I have removed the view, (made table partitioned) so now for these situations am forcing the index on the underlying table so it doesn't matter if statistics is updated or not, cause I know it should use the index based on the date first. Thanks everyone for answering.

     

    Panos.

    Friday, December 23, 2011 3:23 PM
  •  so now for these situations am forcing the index on the underlying table so it doesn't matter if statistics is updated or not, cause I know it should use the index based on the date first.

    Panos,

    That is fine. Make sure though that the index (all indexes in fact) are rebuilt on a regular schedule, like each weekend, to prevent performance degradation due to fragmentation.

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL SERVER GRAND SLAM

    Friday, December 23, 2011 3:44 PM
    Moderator
  • Tom,

    Can you restate your nice treatise above for clustered composite index? Thanks.

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM



    Hi Kalman,

    I didn't see the request to me earlier.  Sorry, I didn't mean to ignore your request.

    When I wrote my reply I knew it was a composite index (which is why I was referring to tracking date as the "first column in the index").  Indeed, the only reason that all of the OP's nonclustered indexes cover his query is because both Tracking_date and Site_Country_Key are keys in the clustered index. So it is the fact that it is a composite clustered index that has as keys every column needed by the query, that made the Query Optimizer choose one of the nonclustered indexes to satisify this query.

    Tom

    Friday, December 23, 2011 4:12 PM