none
Choosing a wide clustered index key and reasons why not to use it

    Question

  • I have been discussing with my manager on the choice of a clustered index key, he favours wide index keys and I prefer narrow unique columns, this is obviously based on reading best practice guides. In order to convince, are there any peculiar or obvious reasons why a wide clsutered index key shouldnt be used, I am referring to say 5-6 columns although all int data type. His argument has always been that the composite keys are what is used within joins, and having them as part a clustered index means fast data retrieval. Does it make any difference and would a non clustered index suffice also.
    Tuesday, December 06, 2011 2:12 PM

Answers

  • Choosing the correct clustered index for a table can be complex.  This is because there are many things to take into consideration when choosing the columns for the index.

    The first thing to remember is that the choice of the clustered index should be driven by performance considerations.  Unlike things like primary key or foreign key constraints, the choice of the clustered index does not affect the integrity of your database only performance.  But there are many performance considerations, some of which often conflict with each other.  Some of the things to take into consideration are (these are in no particular order, which consideration(s) are most important will depend on the usage (insert, delete, update, and select) done on your table:

    1) Clustered indexes are very good for range queries, that is queries where you select a number of rows which either have the same value in a column (like all rows WHERE CREATE_DATE >= '20110901' AND < '20111101' or all orderlines where PRODUCT_CODE = 'Shoe').  If a significant amount of work done in your system is range queries on a column, that column is often the best choice for the clustered index (or a composite index where that column is the first column in the index).

    2) Your clustered index should be increasing.  That is every time you insert a new row, it's more efficient if the new row comes after all the other rows.  This helps prevent page splitting.

    3) Your clustered index should be unique.  If your clustered index allows duplicates, SQL needs a duplicate resolver for every value that has duplicates.  This duplicate resolver (8 bytes) is in not only your clustered index, but also every nonclustered index on that table.  When you make index entries bigger, they not only take more disk space, they may push your index to more levels, which will cause more I/O, and they are likely to either take more memory, or if the more memory is not available, index pages in memory are more likely to be pushed to disk, which means the next time thee index page is needed, it won't be in memory, which causes more I/O.  This disk, memory, I/O consideration applies to not only this item, but every item below where I mention it increases the size of an index.

    4) Your clustered index key should be narrow (have a small number of bytes).  Every column in your clustered index is also included in the leaf level of of every clustered index.  Which means all your nonclustered indexes will be larger.

    5) The values of the keys in your clustered index should not change.  Since the clustered index determines the page location of the row, if you change the value of a column in the clustered index, that row is likely to be deleted from the page it's on and inserted into a different page.  In addition, since the key is part of all the nonclustered indexes, all of the nonclustered indexes also have to be updated.

    6) The keys of the clustered index should not be nullable.  If they are then the leaf level of every nonclustered index can be made (usually 3, could be more) bytes larger.  This is because the leaf entry will need a null bitmap.

    7) The keys of the clustered index should be fixed length (like int or datetime or char), not variable length (like varchar).  This is because if there are any variable keys in the clustered index, then the leaf level entry size is increased for the variable column offset array (which is 2 bytes for every variable length column in the key).

    As you can see there are lots of considerations.  However, it would certainly be unusual (IMO) for the best choice for a clustered index on a table to be 6 integer columns.

    Tom

    • Proposed as answer by Hasham NiazEditor Tuesday, December 06, 2011 11:23 PM
    • Marked as answer by Stephanie Lv Tuesday, December 13, 2011 11:41 AM
    Tuesday, December 06, 2011 11:16 PM

All replies

  • Unique CI or None ? It all depends on the queries you run  against the tables and how big is your output and etc.. There is no such thumb of rule at all
    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Tuesday, December 06, 2011 2:17 PM
  • sometimes the columns are used within a join, and other times they are used within extracts, there is also a high degree of updates as well, in some cases directly on the columns.

     

    Tuesday, December 06, 2011 4:13 PM
  • If you are going to have few nonclustered indexes also (probably you will) , then that's going to cause a problem having a wide clustering key because with each non clustered index the clustering key is also stored. Moreover creating a wide clustering key also wastes the size of buffer pool and will increase the index depth too.


    Thanks and regards, Rishabh , Microsoft Community Contributor
    Tuesday, December 06, 2011 4:35 PM
  • The first thing i would see is if my new clustering key is always in the increasing order like an identity.

    IF it the new values are not at the either end of the index structure, it will cause the page splits in the leaf nodes of the index and this will impact the performance.

    And a wide non clustered index is used as part of every non clustered index which increases the size of non clustered indexes 


    Vinay Valeti| If you think my suggestion is useful, please rate it as helpful. If it has helped you to resolve the problem, please Mark it as Answer
    Tuesday, December 06, 2011 5:02 PM
  • Choosing the correct clustered index for a table can be complex.  This is because there are many things to take into consideration when choosing the columns for the index.

    The first thing to remember is that the choice of the clustered index should be driven by performance considerations.  Unlike things like primary key or foreign key constraints, the choice of the clustered index does not affect the integrity of your database only performance.  But there are many performance considerations, some of which often conflict with each other.  Some of the things to take into consideration are (these are in no particular order, which consideration(s) are most important will depend on the usage (insert, delete, update, and select) done on your table:

    1) Clustered indexes are very good for range queries, that is queries where you select a number of rows which either have the same value in a column (like all rows WHERE CREATE_DATE >= '20110901' AND < '20111101' or all orderlines where PRODUCT_CODE = 'Shoe').  If a significant amount of work done in your system is range queries on a column, that column is often the best choice for the clustered index (or a composite index where that column is the first column in the index).

    2) Your clustered index should be increasing.  That is every time you insert a new row, it's more efficient if the new row comes after all the other rows.  This helps prevent page splitting.

    3) Your clustered index should be unique.  If your clustered index allows duplicates, SQL needs a duplicate resolver for every value that has duplicates.  This duplicate resolver (8 bytes) is in not only your clustered index, but also every nonclustered index on that table.  When you make index entries bigger, they not only take more disk space, they may push your index to more levels, which will cause more I/O, and they are likely to either take more memory, or if the more memory is not available, index pages in memory are more likely to be pushed to disk, which means the next time thee index page is needed, it won't be in memory, which causes more I/O.  This disk, memory, I/O consideration applies to not only this item, but every item below where I mention it increases the size of an index.

    4) Your clustered index key should be narrow (have a small number of bytes).  Every column in your clustered index is also included in the leaf level of of every clustered index.  Which means all your nonclustered indexes will be larger.

    5) The values of the keys in your clustered index should not change.  Since the clustered index determines the page location of the row, if you change the value of a column in the clustered index, that row is likely to be deleted from the page it's on and inserted into a different page.  In addition, since the key is part of all the nonclustered indexes, all of the nonclustered indexes also have to be updated.

    6) The keys of the clustered index should not be nullable.  If they are then the leaf level of every nonclustered index can be made (usually 3, could be more) bytes larger.  This is because the leaf entry will need a null bitmap.

    7) The keys of the clustered index should be fixed length (like int or datetime or char), not variable length (like varchar).  This is because if there are any variable keys in the clustered index, then the leaf level entry size is increased for the variable column offset array (which is 2 bytes for every variable length column in the key).

    As you can see there are lots of considerations.  However, it would certainly be unusual (IMO) for the best choice for a clustered index on a table to be 6 integer columns.

    Tom

    • Proposed as answer by Hasham NiazEditor Tuesday, December 06, 2011 11:23 PM
    • Marked as answer by Stephanie Lv Tuesday, December 13, 2011 11:41 AM
    Tuesday, December 06, 2011 11:16 PM