none
Clustered Index vs NonClustered Index

Answers

  • It is all about the level of abstraction you decide to use. Can we at all talk about physical order, when a disk platter is cylindrical, for instance (or what about SSD?)? Or, seen from the file system's perspective, a file can be fragmented. Inside a database file, we have pages (starting with address zero ang going up). A clustered index is a linked list, but as you move from first page in that list to the next page, you might end up going backwards in the page addresing scheme because index fragmentation. So, would you consider the leaf of an index be physically sorted? It depends on what you mean by the term "physically". All about terminology and abstration level in the end...
    Tibor Karaszi, SQL Server MVP | web | blog
    Thursday, February 24, 2011 8:07 AM
  • Hmm comments. You know, if you look at a sufficient detail, there is no such thing as free will. Yet I still believe there is nothing holding my back from having my own opinion.

    If you have a database consisting of one unfragmented file of sufficient size, stored on a single dedicated empty hard drive, and you have created and populated one table with many rows, and you then create a clustered index on that table, then the pages holding the rows are physically ordered.

    For all intents and purposes the best mental model of a clustered index is to assume that scanning the (leaf level of the) clustered index allows sequential reads to be effective. A clustered index allows you to cheaply scan (or return rows) in logical key order.

    Of course the biggest difference between a clustered and nonclustered index is not its (nonexisting) "sorted" property, but the fact that the table data is stored in the index, which makes it a very important and efficient index.

     


    Gert-Jan
    Wednesday, February 23, 2011 8:11 PM
  • Hi Manu,

    The only difference between them as i understand is in their leaf nodes.

    In case of Clustered index the leaf nodes contains data as well as the addresses of previous and next node just in case of telephone directory.

    As we scan through the nodes of clustered index we are getting data as well.

     

    In case of Non clustered indexes we have a pointer that contains clustered index just as the back index of the book.

     

    However, Physical order of the data is not guaranteed in any case


    Thanks and regards, Rishabh
    Thursday, February 24, 2011 5:00 AM
  • Also as mentioned on this link: http://sqlserverdownanddirty.blogspot.com/2010/12/logical-order-not-physical-order.html discuss about the logical ordering on rows instead of physical.

    This article is about the ordering (or absence of -) of the rows inside a data page.  Usually, a data page will contains many rows and these rows can be or not ordered.  SQL-Server doesn't care and from a performance point of view, the fact they are ordered or not inside a data page is practically totally pointless.  Only people dealing with the inner working of SQL-Server at an intimate level need to know that.

    What is interesting to know is how the records are regrouped from page to page and how this grouping is kept when you insert, edit or delete rows. With a clustered index, the rows are kept sequentially on the same page and when they cannot because their total size is to big, they are split into adjacent pages.  As for the doubly linked lists, it's the data pages (containing one or many rows) that are linked this way and not the rows themselves.

    Now, for calling this a physical ordering or not, I really don't care and I think that most people don't care as well.  Some complex concepts cannot be simplified down to a single word and we cannot start writing page after page of explicit text each time we need to say something when there will be no real benefit of doing so.

    This discussion is like the answer of Bill Clinton when he denied to have lied when he had previously said that he didn't have had sex with her (Monica L.) because it was her who have had sex with him.  Splitting hair is just that, hair splitting.


    Sylvain Lafontaine, ing.
    MVP - Access
    Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
    Independent consultant and remote programming for Access and SQL-Server (French)

    Thursday, February 24, 2011 5:18 AM
  • So, when we create a Clustered Index on a table with records populated, it physically sorts the data pages, but not the rows.

    No, the data pages are not sorted physically and neither are the rows inside a page.

    What happends with a clustered index is that the rows are sequentially grouped by page.  For example, if the rows #3 and #5 are stored on a page then the row #4 is also on that page and cannot be on another page. Furthermore, the row #6 is either on the same page or on the following page (which is doubled linked btw) but not on any other page and the same for the row #2: it is on the same page or on the page directly preceding it.

    If you edit the row #4 so that it will be to big to be stored on the same page with all the other rows already on the same page, the page will be split in two (or in three if the new row is big enough) so that the rows #3 and #4 will remains on the same page and the rows #5 and following will be stored on a brand new page which will become the new following page for the double linked list.

    In the case of a heap (when there is no clustered index); there is no guaranty about the location of a row amongst the others: even if the rows #3 and #5 are on the same page, the row #4 can be anywhere else; on any other data page which is part of the table.  If the rows #3, #4 and #5 are on the same page but the row #4 is changed so that it's now to big to remain on the same page with all the other rows, only the row #4 will be moved to a new location.  This can be a brand new page or an already existing page where there is sufficient place to receive the updated row.

    Furthermore, in the case of a clustered index, as you can define the boundaries (min and max values for any page) of a page and garanty that anything between these boundaries are also located on the same page, then you can use the data pages as the last level of the index.  If the last leaf node tell you that a page begin with the row #3 and finish with the row #5 then you know that the row #4 is also on that same page.

    This is not the case with a heap: there is no relation between the min and max values of a page and the other rows located on that same page.   Each row can be anywhere, on any page with no relation with the other sequentially adjacent rows.  This means that for the unclustered index, the last level of the index must contains a pointer toward the correct data page for each row.

    It the table is a heap (ie., it doesn't contains a clustered index), then the pointer stored at the leaf level for the unclustered index is the identifiant of the data page.  If a row change its location - it is moved to a new page - then all the indexes on that table must be updated for the new location of that row.

    In the case that a table contains a cluster index, the situation is a little different: it is not the identifiant of the page that is stored in the non-clustered indexes at the leaf level but simply the value of the key used for the clustered index if the index is unique.  If the clustered index is not unique, a special, hidden identity field is added to each row to make the combination of the key + this identity field unique and this combination is what gets stored in the non-clustered indexes.

    Also, as the identifiant for the pages are not stored in the non-clustered indexes when the table contains a clustered index, these indexes don't need to be updated when a row is transfered to a new page.  The non-clustered indexes are never changed when a row of a table with a clustered index is changed.  They will get changed only when there are insertions or deletions that will change the boundaries of a leaf page and/or node page.

    This is important because the simplified update for a clustered index can solve some deadlock problems.


    Sylvain Lafontaine, ing.
    MVP - Access
    Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
    Independent consultant and remote programming for Access and SQL-Server (French)

    Thursday, February 24, 2011 9:45 AM

All replies

  • Hmm comments. You know, if you look at a sufficient detail, there is no such thing as free will. Yet I still believe there is nothing holding my back from having my own opinion.

    If you have a database consisting of one unfragmented file of sufficient size, stored on a single dedicated empty hard drive, and you have created and populated one table with many rows, and you then create a clustered index on that table, then the pages holding the rows are physically ordered.

    For all intents and purposes the best mental model of a clustered index is to assume that scanning the (leaf level of the) clustered index allows sequential reads to be effective. A clustered index allows you to cheaply scan (or return rows) in logical key order.

    Of course the biggest difference between a clustered and nonclustered index is not its (nonexisting) "sorted" property, but the fact that the table data is stored in the index, which makes it a very important and efficient index.

     


    Gert-Jan
    Wednesday, February 23, 2011 8:11 PM
  • If you have a database consisting of one unfragmented file of sufficient size, stored on a single dedicated empty hard drive, and you have created and populated one table with many rows, and you then create a clustered index on that table, then the pages holding the rows are physically ordered.

     

    Gert-Jan

    Thanks Gert,

    But I see a confliction here b/w your statement and the above by MS.

    Also as mentioned on this link: http://sqlserverdownanddirty.blogspot.com/2010/12/logical-order-not-physical-order.html discuss about the logical ordering on rows instead of physical.


    ~Manu
    http://sqlwithmanoj.wordpress.com
    MCCA 2011
    Thursday, February 24, 2011 3:48 AM
  • Hi Manu,

    The only difference between them as i understand is in their leaf nodes.

    In case of Clustered index the leaf nodes contains data as well as the addresses of previous and next node just in case of telephone directory.

    As we scan through the nodes of clustered index we are getting data as well.

     

    In case of Non clustered indexes we have a pointer that contains clustered index just as the back index of the book.

     

    However, Physical order of the data is not guaranteed in any case


    Thanks and regards, Rishabh
    Thursday, February 24, 2011 5:00 AM
  • Also as mentioned on this link: http://sqlserverdownanddirty.blogspot.com/2010/12/logical-order-not-physical-order.html discuss about the logical ordering on rows instead of physical.

    This article is about the ordering (or absence of -) of the rows inside a data page.  Usually, a data page will contains many rows and these rows can be or not ordered.  SQL-Server doesn't care and from a performance point of view, the fact they are ordered or not inside a data page is practically totally pointless.  Only people dealing with the inner working of SQL-Server at an intimate level need to know that.

    What is interesting to know is how the records are regrouped from page to page and how this grouping is kept when you insert, edit or delete rows. With a clustered index, the rows are kept sequentially on the same page and when they cannot because their total size is to big, they are split into adjacent pages.  As for the doubly linked lists, it's the data pages (containing one or many rows) that are linked this way and not the rows themselves.

    Now, for calling this a physical ordering or not, I really don't care and I think that most people don't care as well.  Some complex concepts cannot be simplified down to a single word and we cannot start writing page after page of explicit text each time we need to say something when there will be no real benefit of doing so.

    This discussion is like the answer of Bill Clinton when he denied to have lied when he had previously said that he didn't have had sex with her (Monica L.) because it was her who have had sex with him.  Splitting hair is just that, hair splitting.


    Sylvain Lafontaine, ing.
    MVP - Access
    Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
    Independent consultant and remote programming for Access and SQL-Server (French)

    Thursday, February 24, 2011 5:18 AM
  • But I see a confliction here b/w your statement and the above by MS.


    I don't see any conflict between my statement and the blog article you linked to.

    And that is the reason I wrote my answer. Because the blog article is technically true, but of no consequence to a common developer. The stuff I mentioned about a clustered index is.


    Gert-Jan
    Thursday, February 24, 2011 7:55 AM
  • It is all about the level of abstraction you decide to use. Can we at all talk about physical order, when a disk platter is cylindrical, for instance (or what about SSD?)? Or, seen from the file system's perspective, a file can be fragmented. Inside a database file, we have pages (starting with address zero ang going up). A clustered index is a linked list, but as you move from first page in that list to the next page, you might end up going backwards in the page addresing scheme because index fragmentation. So, would you consider the leaf of an index be physically sorted? It depends on what you mean by the term "physically". All about terminology and abstration level in the end...
    Tibor Karaszi, SQL Server MVP | web | blog
    Thursday, February 24, 2011 8:07 AM
  • Thanks Tibor & Gert,

    Looks like I'm coming out of the confusion as I was diving deep below the abstraction.


    ~Manu
    http://sqlwithmanoj.wordpress.com
    MCCA 2011
    Thursday, February 24, 2011 8:10 AM
  • Gert,

    I didn't meant to say that you were wrong.

    Got your point, I'm just co-relating all the statements put from different perspective.

     

    Don't know why the anxiety is taking me that deep about the physical architecture.

     

    So, when we create a Clustered Index on a table with records populated, it physically sorts the data pages, but not the rows.

    Now when we insert random records, how do they fit in? taking 50% fill-factor.

    Until page split its fine, but on page split does data pages gets sorted again? Or its just the logical linking between them by doubly linklists?


    ~Manu
    http://sqlwithmanoj.wordpress.com
    MCCA 2011
    Thursday, February 24, 2011 8:21 AM
  • So, when we create a Clustered Index on a table with records populated, it physically sorts the data pages, but not the rows.

    No, the data pages are not sorted physically and neither are the rows inside a page.

    What happends with a clustered index is that the rows are sequentially grouped by page.  For example, if the rows #3 and #5 are stored on a page then the row #4 is also on that page and cannot be on another page. Furthermore, the row #6 is either on the same page or on the following page (which is doubled linked btw) but not on any other page and the same for the row #2: it is on the same page or on the page directly preceding it.

    If you edit the row #4 so that it will be to big to be stored on the same page with all the other rows already on the same page, the page will be split in two (or in three if the new row is big enough) so that the rows #3 and #4 will remains on the same page and the rows #5 and following will be stored on a brand new page which will become the new following page for the double linked list.

    In the case of a heap (when there is no clustered index); there is no guaranty about the location of a row amongst the others: even if the rows #3 and #5 are on the same page, the row #4 can be anywhere else; on any other data page which is part of the table.  If the rows #3, #4 and #5 are on the same page but the row #4 is changed so that it's now to big to remain on the same page with all the other rows, only the row #4 will be moved to a new location.  This can be a brand new page or an already existing page where there is sufficient place to receive the updated row.

    Furthermore, in the case of a clustered index, as you can define the boundaries (min and max values for any page) of a page and garanty that anything between these boundaries are also located on the same page, then you can use the data pages as the last level of the index.  If the last leaf node tell you that a page begin with the row #3 and finish with the row #5 then you know that the row #4 is also on that same page.

    This is not the case with a heap: there is no relation between the min and max values of a page and the other rows located on that same page.   Each row can be anywhere, on any page with no relation with the other sequentially adjacent rows.  This means that for the unclustered index, the last level of the index must contains a pointer toward the correct data page for each row.

    It the table is a heap (ie., it doesn't contains a clustered index), then the pointer stored at the leaf level for the unclustered index is the identifiant of the data page.  If a row change its location - it is moved to a new page - then all the indexes on that table must be updated for the new location of that row.

    In the case that a table contains a cluster index, the situation is a little different: it is not the identifiant of the page that is stored in the non-clustered indexes at the leaf level but simply the value of the key used for the clustered index if the index is unique.  If the clustered index is not unique, a special, hidden identity field is added to each row to make the combination of the key + this identity field unique and this combination is what gets stored in the non-clustered indexes.

    Also, as the identifiant for the pages are not stored in the non-clustered indexes when the table contains a clustered index, these indexes don't need to be updated when a row is transfered to a new page.  The non-clustered indexes are never changed when a row of a table with a clustered index is changed.  They will get changed only when there are insertions or deletions that will change the boundaries of a leaf page and/or node page.

    This is important because the simplified update for a clustered index can solve some deadlock problems.


    Sylvain Lafontaine, ing.
    MVP - Access
    Blog/web site: http://coding-paparazzi.sylvainlafontaine.com
    Independent consultant and remote programming for Access and SQL-Server (French)

    Thursday, February 24, 2011 9:45 AM
  • Thanks Sylvian, for the detailed clarification and clearing the doubt I had.


    ~Manu
    http://sqlwithmanoj.wordpress.com
    MCCA 2011
    Thursday, February 24, 2011 4:36 PM