none
Nonclustered index scan instead of seek on a heap table

    Question

  • Ok guys,

    I'm trying to understand why SQL Server is doing this, but I've had no luck. Here's how the table is structured:
    This table has 6 columns, and around 4 million rows. It is using around 950MB for data storage and 400MB for index storage. It is a heap table with one non unique, nonclustered index.

    The non clustered index contains two columns in its index key, and no included columns. I run a query similar to the following:

    select column1 from heaptable where column1='value'

    Column1 is included in the index key. This returns two rows. When I examine the execution plan, an Index scan is being performed, instead of a seek. The index is less than 1% fragmented, and the columns being used have updated statistics. The query runs very quickly.

    My questions are as follows:
    -Why is SQL Server choosing to do an index scan instead of a seek with a query that has such low selectivity? I was under the impression that seeks could be performed against heap tables if the non clustered index has the search values in the queries predicates.
    -Could the index scan be a scan of a small range of index pages in the index? If so, how can I check this?

    Thank you!
    Justin
    • Moved by Tom PhillipsModerator Monday, July 06, 2009 4:51 PM TSQL Question (From:SQL Server Database Engine)
    Monday, July 06, 2009 4:41 PM

Answers

  • If column1 is the second column in the index then SQL Server has no choice but to use Index Scan. Here the index key order is very very important.

    If the column1 is indeed the first column then update the statistics on this table. If the stats are out of date then SQL Server will not know whether this value has 1 row or 1 million rows and SQL Server will use the Index scan. Here specifying the unique is very very important (if the index/key is unique then SQL Server will know exactly that there is only one row)



    | Sankar Reddy | http://sankarreddy.spaces.live.com/ |
    • Marked as answer by JustinM_ Monday, July 06, 2009 5:30 PM
    Monday, July 06, 2009 4:56 PM

All replies

  • If column1 is the second column in the index then SQL Server has no choice but to use Index Scan. Here the index key order is very very important.

    If the column1 is indeed the first column then update the statistics on this table. If the stats are out of date then SQL Server will not know whether this value has 1 row or 1 million rows and SQL Server will use the Index scan. Here specifying the unique is very very important (if the index/key is unique then SQL Server will know exactly that there is only one row)



    | Sankar Reddy | http://sankarreddy.spaces.live.com/ |
    • Marked as answer by JustinM_ Monday, July 06, 2009 5:30 PM
    Monday, July 06, 2009 4:56 PM
  • //
    Column1 is included in the index key. This returns two rows. When I examine the execution plan, an Index scan is being performed, instead of a seek. The index is less than 1% fragmented, and the columns being used have updated statistics. The query runs very quickly.

    My questions are as follows:
    -Why is SQL Server choosing to do an index scan instead of a seek with a query that has such low selectivity? I was under the impression that seeks could be performed against heap tables if the non clustered index has the search values in the queries predicates.
    -Could the index scan be a scan of a small range of index pages in the index? If so, how can I check this?

    Thank you!
    Justin

    //

    The query run quickly because it even though it is a scan, but it needs to traverse/scan  through less data pages (less than half) than when it do a table scan.  As Sankar mentioned the column sequence must be the issue.  For testing purpose, create a index with only column1 and see, it must be a seek.


    Madhu

    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, July 06, 2009 5:11 PM
  • Wow, thanks a lot Sankar. You're absolutely right. Column1 was indeed the second column in the index key. I switched the query to column2, and a seek was performed. I read the books online documentation regarding nonclustered/clustered indexes however there is no mention of this behavior. Do you have an article or a books online document I can refer to regarding this?
    Monday, July 06, 2009 5:12 PM
  • SQL Server does not have an equivalent to Oracle's "Index Skip Scan".  If your index is constructed

    create index whatever on yourTable(anotherColumn, column1)

    there is no direct way to engage this index in SQL Server.  If the "leading edge" column of the index -- "anotherColumn" is a low cardinality column you are potentially in a scenario in which a "skip scan" might be useful.  Again, while Oracle has this optimization, SQL Server does not.  There are two things that can be done:

    (1) create an index that has "column1" as the "leading edge" column of the index
    (2) if the first column in the index is low cardinality, there may be a workaround to coerce the use of this index

    I would normally not recommend the second option; if you are wanting to run a "quick and dirty" query we can talk about a way to simulate an "index skip scan".


    Kent Waldrop

    EDIT:

    I just saw your post after I posted my response.  A couple of things:

    (1) Is this going to be used as a production query?
    (2) Do you have a domain table associated with the first column of the index?
    (3) is the first column of the index low cardinality?
    Monday, July 06, 2009 5:18 PM
  • Wow, thanks a lot Sankar. You're absolutely right. Column1 was indeed the second column in the index key. I switched the query to column2, and a seek was performed. I read the books online documentation regarding nonclustered/clustered indexes however there is no mention of this behavior. Do you have an article or a books online document I can refer to regarding this?

    General Index Design Guidelines

    From the topic:


    Consider the order of the columns if the index will contain multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.
    For example, if the index is defined as LastName, FirstName the index will be useful when the search criterion is WHERE LastName = 'Smith' or WHERE LastName = Smith AND FirstName LIKE 'J%'. However, the query optimizer would not use the index for a query that searched only on FirstName (WHERE FirstName = 'Jane').


    This is also must read.


    Statistics Used by the Query Optimizer in Microsoft SQL Server 2005


    AMB
    Monday, July 06, 2009 5:21 PM
  • Wow, thanks a lot Sankar. You're absolutely right. Column1 was indeed the second column in the index key. I switched the query to column2, and a seek was performed. I read the books online documentation regarding nonclustered/clustered indexes however there is no mention of this behavior. Do you have an article or a books online document I can refer to regarding this?

    I could not find any MS doc related to this , but  general practice is that, the column order in an index should be in the order of selectivity. High selective column should come first and so on. And also sql server keeps statistics on  first column (this was there in 2000 but not sure about the latest versions)

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, July 06, 2009 5:26 PM
  • Great answers. Thank you for the help everyone.
    Monday, July 06, 2009 5:30 PM
  • Just to add to what everyone else said, I wanted to expand on something that I find interesting.  (I "stole" this from Kalen Delaney's book "Inside SQL Server 2005: Query Tuning and Optimization").

    It is true that if you have a 2-column index, and you have a WHERE clause on the 2nd column, you will NEVER see an Index Seek in your query plan.

    For example, let's say you use the AdventureWorks database and you create an index on LastName,Firstname in the Person.Contact table:

    CREATE INDEX IX_Contact_LastFirst on Person.Contact (LastName,FirstName)

    If you do a query with only FirstName in the WHERE clause, it will not (simply because it CAN NOT) do an index seek in that index.  It could SEEK on LastName, but it can NOT seek on FirstName.

    However, and I find this very interesting, consider the following queries:

    SELECT * FROM Person.Contact WHERE FirstName LIKE 'K%'
    SELECT * FROM Person.Contact WHERE FirstName LIKE 'Y%'

    The first query will do a clustered index scan... and that's because we did a SELECT *... we want all columns.  So it does a scan through the clustered index.  Fine.  That makes perfect sense.

    However the second query will NOT do the clustered index scan... it will instead do a scan on the index that we just created on LastName,FirstName, and for each item it finds in the index with FirstName LIKE 'Y%', it will do a key lookup into the clustered index to get all the columns.

    This is because of statistics.  SQL knows, because of statistics on the non-clustered index we created, that there are very few rows with FirstName LIKE 'Y%' and so it figures that it is worth it to scan the index and do a lookup into the clustered index.  It can't SEEK into the index, but it figures it's less cost to SCAN the index and do key lookups than it is to do a SCAN of the entire clustered index like it did with the predicate of FirstName LIKE 'K%'.

    This may not answer your question, but it's a little piece of interesting knowledge in terms of figuring out why SQL takes a certain approach to queries.
    --Brad
    Monday, July 06, 2009 5:45 PM
  • Just to add to what everyone else said, I wanted to expand on something that I find interesting.  (I "stole" this from Kalen Delaney's book "Inside SQL Server 2005: Query Tuning and Optimization").

    It is true that if you have a 2-column index, and you have a WHERE clause on the 2nd column, you will NEVER see an Index Seek in your query plan.

    For example, let's say you use the AdventureWorks database and you create an index on LastName,Firstname in the Person.Contact table:

    CREATE INDEX IX_Contact_LastFirst on Person.Contact (LastName,FirstName)

    If you do a query with only FirstName in the WHERE clause, it will not (simply because it CAN NOT) do an index seek in that index.  It could SEEK on LastName, but it can NOT seek on FirstName.

    However, and I find this very interesting, consider the following queries:

    SELECT * FROM Person.Contact WHERE FirstName LIKE 'K%'
    SELECT * FROM Person.Contact WHERE FirstName LIKE 'Y%'

    The first query will do a clustered index scan... and that's because we did a SELECT *... we want all columns.  So it does a scan through the clustered index.  Fine.  That makes perfect sense.

    However the second query will NOT do the clustered index scan... it will instead do a scan on the index that we just created on LastName,FirstName, and for each item it finds in the index with FirstName LIKE 'Y%', it will do a key lookup into the clustered index to get all the columns.

    This is because of statistics.  SQL knows, because of statistics on the non-clustered index we created, that there are very few rows with FirstName LIKE 'Y%' and so it figures that it is worth it to scan the index and do a lookup into the clustered index.  It can't SEEK into the index, but it figures it's less cost to SCAN the index and do key lookups than it is to do a SCAN of the entire clustered index like it did with the predicate of FirstName LIKE 'K%'.

    This may not answer your question, but it's a little piece of interesting knowledge in terms of figuring out why SQL takes a certain approach to queries.
    --Brad

    This is all true; however, it doesn't mean that you can't utilize that index and get better performance than with the scan.  No, I don't consider a fake skip scan to be good practice, but every 2 or 3 years I run into a situation in which it is helpful for a more efficient -- sometimes dramatically more efficient -- "quick and dirty" query.  I posted something  related back here

        http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/ace21b18-c856-4def-ab3b-54758c553baf/

    Again, I don't consider this good practice -- just comes in handy once every couple of blue moons.
    Monday, July 06, 2009 6:06 PM
  • LOL!  As I was reading that thread, I was thinking to myself, "Gee this looks familiar... I remember when Kent posted this"... and then I scrolled down further and it looks like the thread was a discussion between you and me, and I brought up the same AdventureWorks example as I did in this thread.

    Again, I think we agree... it's not something to rely on, but it comes in handy.

    --Brad
    Monday, July 06, 2009 6:20 PM
  • Here is the corresponding Microsoft Connect item: https://connect.microsoft.com/SQLServer/feedback/details/695044/implement-index-skip-scan Please vote for it.
    Saturday, October 15, 2011 2:30 PM