none
How does SQL Server 2008 use multiple indices in a query?

    Question

  • For simplicity, let's say I have a table with 2 integer columns and a char column.  None of the columns contains unique data (e.g., 125 rows can have an I1 value of 111, etc.).  The I1 column has a clustered index and the I2 column has a  non-clustered index.  C1, the char column, is not indexed.

    Let's say my search criteria is I1=111 AND C1='asdf'.  How does the engine do the search?  Will it use I1 to get a subset of rows that meet the I1=111 criteria and then check JUST those rows for C1='asdf'?

    What if my search is I1=111 AND I2=222 AND C1='asdf'?  Will the engine use I1 to get a subset of the rows, then I2 to further subset that subset, and finally check the second subset for C1='asdf'?  More specifically, will the I2 "selection" process ONLY look at rows that have already passed the I1 "selection" process?

    Does whether an index is clustered or not have any impact on the process?

     

    • Moved by Tom PhillipsModerator Friday, August 20, 2010 5:23 PM Possibly better answer from TSQL forum (From:SQL Server Database Engine)
    Friday, August 20, 2010 4:54 PM

Answers

  • Well, it all depends on the optimizer which relies upon statistics, specificity, index definitions, estimated I/O (row and index widths), row counts, et cetera, et cetera, et cetera.  The optimizer will cost out the strategies and will choose base on these and other data points.   So, there is no single answer to your question, since depending on these factors the optimizer will choose differently. 

    However, a few useful insights are provided by Brad McGehee here, including a discussion of "index intersection":
    http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx

    Other "index intersection" articles include:

    http://www.sql-server-performance.com/articles/per/index_intersection_p1.aspx 
    http://blogs.msdn.com/b/windchillonsql/archive/2010/06/18/sql-server-index-intersection-union.aspx
    http://sqlserverpedia.com/blog/sql-server-bloggers/optimizing-sql-server-joins/

    RLF

     

    • Proposed as answer by Naomi NModerator Friday, August 20, 2010 5:38 PM
    • Marked as answer by Jim Cutler Friday, August 20, 2010 6:18 PM
    Friday, August 20, 2010 5:31 PM
  • It all depends on statistics.

    In your first example, since you don't have an index on C1, the optimizer will do a SEEK into the clustered index (on I1) to get the I1=111 rows and will filter those to get the C1='asdf' rows.  However, if you DID have an (nonclustered) index on C1, then it would really depend on how selective each of the predicates is.  If C1='asdf' is more selective than I1=111 (which the optimizer knows by statistics), then it would SEEK into the C1 nonclustered index to get the very few rows, otherwise it would SEEK into the clustered index on I1 to get the I1=111 rows.

    Same with your second example.  It depends on how selective each of the predicates is.

    In answer to your last question, a clustered index may have an impact, because the clustered index IS the data itself.

    Let's say that you do a SELECT * FROM MyTable WHERE I1=111 and I2=222 and C1='asdf'.  There's no index on C1, so the only way it's going to be able to evaluate the value of C1 is to look inside the clustered index somehow.  If the optimizer knows (via statistics) that I2=222 is very selective and only would produce 1 row, then it would choose to SEEK into the nonclustered index of I2, retrieve that row, then it would SEEK into the clustered index for that one row to check the rest of the data to see if it satisfied the rest of the WHERE clause.  However, if I2=222 produces a hundred rows or so, then the optimizer might decide that it's too expensive to constantly SEEK into the nonclustered index for each of those 100 rows to evaluate C1, so it would just ignore I2's clustered index and instead just evaluate everything by processing the clustered index only.

    Hope this helps.

     


    --Brad (My Blog)
    • Proposed as answer by Naomi NModerator Friday, August 20, 2010 5:38 PM
    • Marked as answer by Jim Cutler Thursday, August 26, 2010 5:01 PM
    Friday, August 20, 2010 5:37 PM
    Moderator
  • The answer is still "it depends."  This answer depends on a lot of factors most of which Russel has addressed.  Another factor you have to consider is the columns in your select list.  If your nonclustered index does not include C1 and you select C1 then the optimizer will have to use a key lookup to find the value for column c1.  This concept increases the likelyhood that the optimizer will find the clustered index faster; however, if your nonclustered index covers the entire query, it is more likely that the optimizer will find the nonclustered index less expensive because the rows are typically less wide and therefor less pages are used, making the index less costly to use.

    You can look at dbcc show_statistics('tablename','indexname').  The second window of the procedure shows what predicates can be used to seek

    Note: A clustered index never has to lookup values because it contains all the data of the table, so anytime you seek on the clustered key, you automatically cover the query.


    http://jahaines.blogspot.com/

    Friday, August 20, 2010 5:43 PM
    Moderator
  • > What if my search is I1=111 AND I2=222 AND C1='asdf'?  Will the engine use I1 to get a subset of the rows, then I2 to further subset that subset, and finally check the second subset for C1='asdf'?  More specifically, will the I2 "selection" process ONLY look at rows that have already passed the I1 "selection" process?

    As Russell, Adam and Brad have said, there is a lot of "it depends". With exception of trivial queries, you can never say "the engine will...". However, you can say things like "the engine will never...". And, no, the engine will never do the above. It will not first seek the clustered index to find all rows that matches I1=111, and then seek I2. Because, as it has located all rows where I1=111, it has also seen all values of I2 and C1.

    On the other hand, if the optimizer thinks that I2 is selective, it may seek that index. Since the clustered index key is the row locator for a non-clustered index, it will be able to filter on I1=111 and then seek the clustered index to filter on C1.

    What if both indexes are non-clustered, could the engine use a plan like you describe? No, that would not be possible, because there is no information about I1 in I2. It could do index intersection and seek both rows, and then do a UNION on the result. It could also scan the index on I2 and do a hash join on the row locator, but those plans are not very common.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Friday, August 20, 2010 10:20 PM

All replies

  • Well, it all depends on the optimizer which relies upon statistics, specificity, index definitions, estimated I/O (row and index widths), row counts, et cetera, et cetera, et cetera.  The optimizer will cost out the strategies and will choose base on these and other data points.   So, there is no single answer to your question, since depending on these factors the optimizer will choose differently. 

    However, a few useful insights are provided by Brad McGehee here, including a discussion of "index intersection":
    http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx

    Other "index intersection" articles include:

    http://www.sql-server-performance.com/articles/per/index_intersection_p1.aspx 
    http://blogs.msdn.com/b/windchillonsql/archive/2010/06/18/sql-server-index-intersection-union.aspx
    http://sqlserverpedia.com/blog/sql-server-bloggers/optimizing-sql-server-joins/

    RLF

     

    • Proposed as answer by Naomi NModerator Friday, August 20, 2010 5:38 PM
    • Marked as answer by Jim Cutler Friday, August 20, 2010 6:18 PM
    Friday, August 20, 2010 5:31 PM
  • It all depends on statistics.

    In your first example, since you don't have an index on C1, the optimizer will do a SEEK into the clustered index (on I1) to get the I1=111 rows and will filter those to get the C1='asdf' rows.  However, if you DID have an (nonclustered) index on C1, then it would really depend on how selective each of the predicates is.  If C1='asdf' is more selective than I1=111 (which the optimizer knows by statistics), then it would SEEK into the C1 nonclustered index to get the very few rows, otherwise it would SEEK into the clustered index on I1 to get the I1=111 rows.

    Same with your second example.  It depends on how selective each of the predicates is.

    In answer to your last question, a clustered index may have an impact, because the clustered index IS the data itself.

    Let's say that you do a SELECT * FROM MyTable WHERE I1=111 and I2=222 and C1='asdf'.  There's no index on C1, so the only way it's going to be able to evaluate the value of C1 is to look inside the clustered index somehow.  If the optimizer knows (via statistics) that I2=222 is very selective and only would produce 1 row, then it would choose to SEEK into the nonclustered index of I2, retrieve that row, then it would SEEK into the clustered index for that one row to check the rest of the data to see if it satisfied the rest of the WHERE clause.  However, if I2=222 produces a hundred rows or so, then the optimizer might decide that it's too expensive to constantly SEEK into the nonclustered index for each of those 100 rows to evaluate C1, so it would just ignore I2's clustered index and instead just evaluate everything by processing the clustered index only.

    Hope this helps.

     


    --Brad (My Blog)
    • Proposed as answer by Naomi NModerator Friday, August 20, 2010 5:38 PM
    • Marked as answer by Jim Cutler Thursday, August 26, 2010 5:01 PM
    Friday, August 20, 2010 5:37 PM
    Moderator
  • The answer is still "it depends."  This answer depends on a lot of factors most of which Russel has addressed.  Another factor you have to consider is the columns in your select list.  If your nonclustered index does not include C1 and you select C1 then the optimizer will have to use a key lookup to find the value for column c1.  This concept increases the likelyhood that the optimizer will find the clustered index faster; however, if your nonclustered index covers the entire query, it is more likely that the optimizer will find the nonclustered index less expensive because the rows are typically less wide and therefor less pages are used, making the index less costly to use.

    You can look at dbcc show_statistics('tablename','indexname').  The second window of the procedure shows what predicates can be used to seek

    Note: A clustered index never has to lookup values because it contains all the data of the table, so anytime you seek on the clustered key, you automatically cover the query.


    http://jahaines.blogspot.com/

    Friday, August 20, 2010 5:43 PM
    Moderator
  • > What if my search is I1=111 AND I2=222 AND C1='asdf'?  Will the engine use I1 to get a subset of the rows, then I2 to further subset that subset, and finally check the second subset for C1='asdf'?  More specifically, will the I2 "selection" process ONLY look at rows that have already passed the I1 "selection" process?

    As Russell, Adam and Brad have said, there is a lot of "it depends". With exception of trivial queries, you can never say "the engine will...". However, you can say things like "the engine will never...". And, no, the engine will never do the above. It will not first seek the clustered index to find all rows that matches I1=111, and then seek I2. Because, as it has located all rows where I1=111, it has also seen all values of I2 and C1.

    On the other hand, if the optimizer thinks that I2 is selective, it may seek that index. Since the clustered index key is the row locator for a non-clustered index, it will be able to filter on I1=111 and then seek the clustered index to filter on C1.

    What if both indexes are non-clustered, could the engine use a plan like you describe? No, that would not be possible, because there is no information about I1 in I2. It could do index intersection and seek both rows, and then do a UNION on the result. It could also scan the index on I2 and do a hash join on the row locator, but those plans are not very common.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Friday, August 20, 2010 10:20 PM