locked
Table scan vs index scan RRS feed

  • Question

  • Can someone clearly mention the difference between table scan vs Index scan (An example will be worth). After googling, I am totally confused. Some people say that it is the same thing but no difference, but there are other bloggers who say that table scan and index scan are two different techniques. My questions are

    1. Are they same?
    2. What is the clearcut difference between these two?


    Thanks in adcance
    KDV
    Tuesday, June 9, 2009 3:16 PM

Answers

  • > ", an INDEX scan is _usually_ a range scan, which is usually a much cheaper operation (by comparison), because it reads so much less, even though it still needs to go to the TABLE afterwards to get the actual data."


    I'm sorry, but this isn't correct.  If you're using the index to retrieve an ordered range, that's an Index SEEK.  An Index SCAN is scanning all the rows of the table, just like a table scan -- it's just doing so by reading index leaves, rather than the base table data itself.  There's a performance advantage to an index scan, but not because of the number of rows examined (they're identical), but rather simply because an index tends to be much narrower than the full table column.

    Furthermore, after the index scan, the QO rarely needs to go read the table...the typical reason an index scan is chosen is when the columns required for the join is available entirely in the index leaves.

    Michael Asher
    Tuesday, June 9, 2009 4:04 PM
  • KDV,

    SQL Server reads data in pages.  There are actually 3 different page types (for in-row data, row-overflow data, and for LOBs).  So when SQL attempts to "read" a row, its going to request an entire page, even when the row is smaller.  You can actually look at those data pages if you want, by using the DBCC PAGE command.

    Creating an index on a single column will certainly speed queries that query based off just that colum, even when it has to scan the entire index.  Because so many more keys can fit on a single page, you'll have to do far less i/o's to read the actual data.   That of course assumes your table has more than one row in it, of course.
    Michael Asher
    Tuesday, June 9, 2009 5:42 PM

All replies

  • 1. No

    2. TABLE Scan means the DB engines reads the data in the TABLE directly.
        INDEX scan means the DB reads the data in an INDEX directly, when it finds what it wants in the INDEX, it uses the record addresses in the INDEX to go to the TABLE and read only what it requires.

    Generally, a TABLE is not ordered. Therefore, a TABLE scan is usually a full TABLE scan, which can be a costly operation.
    An INDEX is by definition ordered, therefore, an INDEX scan is _usually_ a range scan, which is usually a much cheaper operation (by comparison), because it reads so much less, even though it still needs to go to the TABLE afterwards to get the actual data.

    Note. That if all the data happens to be on the INDEX, there is no need to go to the TABLE afterwards.
    Tuesday, June 9, 2009 3:21 PM
  • 1. no
    2. an index usually only include the columns you'll need to search on (doesn't include all the data from the original table so the look up is faster)

    I suggest you read the article from wikipedia website (first 2 paragraphs)

    If you have any further question after reader the article feel free to ask
    Tuesday, June 9, 2009 3:25 PM

  • I think I understood the followng. I may be wrong.
    Let us say that I have a table of 100,000 entries with index on one column.
    Where clause makes the matching rows count to be approx 10,000, so index scan happens.
    That means in index scan (because it is the range of rows), it is possible to scan more than 10,000 rows (until the upper boundary is reached in sorted values of the keys), but it will not be in any case to scan all the rows in the index. Correct?

    In table scan it has to scan all the rows because there is chance of finding a match row at the last data page. Correct?

    Am I correctly understanding?

    Thanks


    KDV
    Tuesday, June 9, 2009 3:34 PM
  • >Where clause makes the matching rows count to be approx 10,000, so index scan happens.

    Likely. No guarantees as total costing must be used before it determines anything.

    The rest of your statement is enough to start on. There's a lot written on this subject in books and blogs, easily found through bing.
    Tuesday, June 9, 2009 3:43 PM
  • > ", an INDEX scan is _usually_ a range scan, which is usually a much cheaper operation (by comparison), because it reads so much less, even though it still needs to go to the TABLE afterwards to get the actual data."


    I'm sorry, but this isn't correct.  If you're using the index to retrieve an ordered range, that's an Index SEEK.  An Index SCAN is scanning all the rows of the table, just like a table scan -- it's just doing so by reading index leaves, rather than the base table data itself.  There's a performance advantage to an index scan, but not because of the number of rows examined (they're identical), but rather simply because an index tends to be much narrower than the full table column.

    Furthermore, after the index scan, the QO rarely needs to go read the table...the typical reason an index scan is chosen is when the columns required for the join is available entirely in the index leaves.

    Michael Asher
    Tuesday, June 9, 2009 4:04 PM
  • To add one more point, an index scan on a clustered index (which is simply the table, sorted) is essentially identical to a table scan.
    Michael Asher
    Tuesday, June 9, 2009 4:12 PM
  • Michael, no doubt you are correct. I did not mean my answer to be exact, as the question was not exact.

    As for rarely needing to go to the TABLE, i admit i know little about SQL Server specifically, but that just doesn't sound right. The average INDEX should be to get TABLE data, probably based on the PK.
    Tuesday, June 9, 2009 4:13 PM
  • To add one more point, an index scan on a clustered index (which is simply the table, sorted) is essentially identical to a table scan.
    Michael Asher

    ... and you never see a "table scan" on a clustered table (a table with a clustered index).
    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Tuesday, June 9, 2009 4:14 PM
  • > "As for rarely needing to go to the TABLE, i admit i know little about SQL Server specifically, but that just doesn't sound right. The average INDEX should be to get TABLE data"

    Let me illustrate with a concrete example.  Let's say we have the following table:

    Col_1 int
    Col_2 int
    Col_3 DateTime
    Col_4 Varchar(50)
    Col_5 Varchar(500)

    Let's assume that the PK is Col_1, Col_2 built on a clustered index.  Let's also assume there's a nonclustered index on Col_3, Col_4.

    Now, say you write the following query:

    SELECT Col_1, Col_2
    FROM MyTable
    WHERE Col_3 BETWEEN Date1 and Date2 AND Col_4 <> 'xyz'

    Your nonclustered index will almost certainly be used.  It contains not only all the values for Col_3 and Col_4, but it also contains the values for Col_1 and Col_2 (since every nonclustered index contains the clustered key).  That means, simply by reading the appropriate leaves from the index, there's no need to actually visit the table itself.  You have all the data you need already. 

    Now, if you include Col_5 into that query, portions of the table will need to be read.  But in all other cases, it's not required.

    Michael Asher
    Tuesday, June 9, 2009 4:31 PM
  • >
    >Now, if you include Col_5 into that query, portions of the table will need to be read.  But in all other cases, it's not required.
    >

    Also, if your WHERE clause was specific enough to only choose a few rows, it would probably use the nonclustered index and then do a lookup (for each of those rows) into the clustered index to get the Col_5 values.  But there will be a threshhold under which it will determine that all those lookups can be costly and therefore a clustered index scan may incur less cost and therefore attack it that way.

    As with everything else...  "It depends".
    --Brad
    Tuesday, June 9, 2009 4:44 PM
  • >since every nonclustered index contains the clustered key

    I did not know that, thanx!

    >Now, if you include Col_5 into that query, portions of the table will need to be read.

    And i would guess that is the majority of INDEXes. I have no basis other than feeling and experience for that conjecture, however.
    Tuesday, June 9, 2009 4:57 PM
  • If you mean the "majority of queries", then you might be surprised just how many queries -- especially complex ones joining 5, 10, or even 20 tables -- can have entire branches satisfied simply by reading the appropriate index, without ever having the visit the underlying table(s).
    Michael Asher
    Tuesday, June 9, 2009 5:24 PM
  • Things are quite clear now.

    When doing a table scan (To match the value in a given column), does the server reads entire row before doing a match on the given column?
    Say I have a table with column col1,col2,....col10 and the where clause is only on col2 and somehow table scan performed (due to missing index), then will it read all the columns in each row before matching col2 and then decide if to pick that row or not or it will read only col2 first in each row and match it and read entire row if match is successful. If entire row is read anyway then definitely index creation will benefit on col2 even if col2 has low cardinality (means less selective) because scanning index will perform less IO and only those rows in full will be read which pass through the index.

    Am I correct?

    Thanks

    KDV
    Tuesday, June 9, 2009 5:35 PM
  • KDV,

    SQL Server reads data in pages.  There are actually 3 different page types (for in-row data, row-overflow data, and for LOBs).  So when SQL attempts to "read" a row, its going to request an entire page, even when the row is smaller.  You can actually look at those data pages if you want, by using the DBCC PAGE command.

    Creating an index on a single column will certainly speed queries that query based off just that colum, even when it has to scan the entire index.  Because so many more keys can fit on a single page, you'll have to do far less i/o's to read the actual data.   That of course assumes your table has more than one row in it, of course.
    Michael Asher
    Tuesday, June 9, 2009 5:42 PM
  • I think IO reads are the advantage of having an index even on low selective column (I think so). In my case if col5 stores data (say 4 KB for every row) then creating an index on col2 (Even if it is low selective and it is BIT field and so many index rows can fit in one page) should boost the performance. Is it true?

    Thanks



    KDV
    Tuesday, June 9, 2009 5:59 PM
  • Such a narrow index would be read in very few i/o's, yes, and thus be very fast.  So it would tend to boost performance.

    However, consider the case above where columns 1 through 5 all appear in the SELECT clause, meaning the entire row has to be read anyway.  In this case, if the cardinality is extremely low, the QO might choose to just do a table scan, rather than an index seek followed by reading the table.


    Michael Asher
    Tuesday, June 9, 2009 6:43 PM
  • Guys,

     

       I still need some basic example to understand Index.Scan Vs Table.Scan. As far as I have understaood , Table.Scan scans the entire table. The table is treated as a heap and is checked row by row. Where as If I have a table which contains indexes on certain columns and I write a query to fetch a range of values which uses those indexes (clustered or non-clustered) then I can say that my scan is a Index.Scan. But then what is the difference between Index.Scan & Index.Seek?


    zeeshan jan
    Monday, May 2, 2011 3:15 AM
  • First, you should realize that terminology is not used in such a consistent and precise manner as we might wish...

    A table (the data) can be structured as a heap or as a clustered table (if you have created a clustered index on the table). For a clustered table, the data pages is actual the leaf of the clustered index. You *never* see a table scan of such table, since the operation of "looking at all pages" is an index scan (remember the feal level of the index *is* the data).

    However, sometimes, we loosely say "table scan" for clustered tables as well, when we in more general terms want to say "look at every data page". But, you never see such operation on the execution plane (table scan for a clustered table).

    SQL Server can also look at every page for a *non-clustered* index. That is an index scan over that non-clustered index. If all columns needed for the query is in the index and no index can be used to seek, then it is better to scan the non-clustered index than the entire table (because the non-clustered index typically has fewer pages than the whole table).

     

    "Where as If I have a table which contains indexes on certain columns and I write a query to fetch a range of values which uses those indexes (clustered or non-clustered) then I can say that my scan is a Index.Scan."

    No, what you describe above is an index seek. If SQL Server navigates the index tree to find the rows it wants, then it is an index seek, regardless of how many rows meet your criteria (the selectivity of the seek).


    Tibor Karaszi, SQL Server MVP | web | blog
    Monday, May 2, 2011 6:01 PM