locked
non-unique clustered index performance RRS feed

  • Question

  • I'm thinking of using GUIDs instead of int identity from now on. I've read that a GUID column is very bad to use as a clustered index though. I was wondering if, generally speaking, it would be an acceptable design to make the clustered index another, non-unique column, that will group the rows according to how they're most often accessed. Say, if I have an ItemDetail table that FK's on Item by ItemID, I would make ItemID the clustered index on itemDetail. Any gotchas there?

    Wednesday, September 29, 2010 2:06 PM

Answers

  • First of all, the clustered index needs to be unique. If you don't define it as unique, SQL Server will add another hidden 4 bytes "uniquefier" in order to achieve that.

    Second, clustered index needs to be as narrow as possible. SQL Server keeps the values from the clustered index in the rows of non-clustered indexes so bigger clustered index values will make non-clustered index rows wider which leads to: less rows per data page -> more data pages -> more IO operations -> degrade performance. This is basically the main argument of using identity instead of GUID due the sizes (16 bytes vs 4/8 bytes). On other hand, bigger row size does not necessary means that it's always the bad choice. There are some cases when identity can simplify quite a few different tasks You need to make sure that you create the new values with NEWSEQUENTIALID() in the case if you want to emulate identity behavior and make the values increase monotonously.

    On other hand, it always benefits the system if clustered index covers most frequent queries. In your case I would probably make the composite clustered index on non-unique ItemId as the left-most column and either identity or guid as the right-most column. It will make the index unique.


    Thank you!

    My blog: http://aboutsqlserver.com

    • Marked as answer by Tom Li - MSFT Thursday, October 7, 2010 5:43 AM
    Wednesday, September 29, 2010 4:58 PM

All replies

  • First of all, the clustered index needs to be unique. If you don't define it as unique, SQL Server will add another hidden 4 bytes "uniquefier" in order to achieve that.

    Second, clustered index needs to be as narrow as possible. SQL Server keeps the values from the clustered index in the rows of non-clustered indexes so bigger clustered index values will make non-clustered index rows wider which leads to: less rows per data page -> more data pages -> more IO operations -> degrade performance. This is basically the main argument of using identity instead of GUID due the sizes (16 bytes vs 4/8 bytes). On other hand, bigger row size does not necessary means that it's always the bad choice. There are some cases when identity can simplify quite a few different tasks You need to make sure that you create the new values with NEWSEQUENTIALID() in the case if you want to emulate identity behavior and make the values increase monotonously.

    On other hand, it always benefits the system if clustered index covers most frequent queries. In your case I would probably make the composite clustered index on non-unique ItemId as the left-most column and either identity or guid as the right-most column. It will make the index unique.


    Thank you!

    My blog: http://aboutsqlserver.com

    • Marked as answer by Tom Li - MSFT Thursday, October 7, 2010 5:43 AM
    Wednesday, September 29, 2010 4:58 PM
  • It is acceptable to use non-unique fields as the CK.  In fact it can be a best design.  I worked on a system where this was exactly the case, a 500gb table where a typical access fetched about fifty adjacent rows.  Note that the CK need not be the PK, the PK can (must!) still be unique.

    Josh

     

    Monday, October 4, 2010 2:43 AM
  • In addition, you  have to test  your system before choosing CI to be created on specific column.. there is no right answer, as it depends on queries running against the table
    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, October 4, 2010 6:24 AM
  • It is acceptable to use non-unique fields as the CK.  In fact it can be a best design.  I worked on a system where this was exactly the case, a 500gb table where a typical access fetched about fifty adjacent rows.  Note that the CK need not be the PK, the PK can (must!) still be unique.

    Josh

     


    Josh, you for sure can define non-unique clustered index, but SQL Server adds 4 bytes to the row anyway. Here is from the books online :

    If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.

    I would rather create integer identity field and add it to the index. I will have better control in such case - for example I can achieve better performance querying/updating rows by CI values instead of using non-clustered PK. You obviously cannot do that with uniqueifier you don't have any access to. 


    Thank you!

    My blog: http://aboutsqlserver.com

    Monday, October 4, 2010 5:06 PM
  • I would rather create integer identity field and add it to the index. I will have better control in such case - for example I can achieve better performance querying/updating rows by CI values instead of using non-clustered PK. You obviously cannot do that with uniqueifier you don't have any access to. 

    Well, but I simply didn't have any use for a visible uniquifier.

    You can only have a single identity field for a table, right, and if I want that for the (nonclustered) PK, then, ... hmm, can the same field *also* be the low-order field on a three-key CK?  That would be creative!

    Josh

    Wednesday, October 6, 2010 5:16 AM
  • I used uniqueidentifier keys for a long time; guess the only reason I stopped was moving to a place where they use identities for everything.

    My thoughts on clustered index selection is that it is generally a wasted opportunity to place the clustered index on the uniqueidentifier PK; better to use the clustered index where it will be useful in joining or sorting (often on a name or a date) and use a non-clustered PK.

    Wednesday, October 6, 2010 2:42 PM
  • I would rather create integer identity field and add it to the index. I will have better control in such case - for example I can achieve better performance querying/updating rows by CI values instead of using non-clustered PK. You obviously cannot do that with uniqueifier you don't have any access to. 

    Well, but I simply didn't have any use for a visible uniquifier.

    You can only have a single identity field for a table, right, and if I want that for the (nonclustered) PK, then, ... hmm, can the same field *also* be the low-order field on a three-key CK?  That would be creative!

    Josh


    Josh,

    Think about update of the row. With unique clustered index and visible uniquifier you can update it based on the clustered index value. For non-unique clustered index, you must use either non-clustered key or the combination of clustered index and primary key columns in the where. In both cases the plan would be less efficient.


    Thank you!

    My blog: http://aboutsqlserver.com

    Wednesday, October 6, 2010 3:51 PM
  • Think about update of the row. With unique clustered index and visible uniquifier you can update it based on the clustered index value. For non-unique clustered index, you must use either non-clustered key or the combination of clustered index and primary key columns in the where. In both cases the plan would be less efficient.

    In the app I worked on, the important thing was to get those 50 rows efficiently.  Once you're down to fifty, getting any one of them is trivial, but if we had to scan all over a 500gb table to find the 50 rows that would have been inefficient.

    It doesn't hurt to make the uniquifier visible, it just doesn't help much in cases like this, and maybe the design looks simpler if you do without it.

    Josh

    Thursday, October 7, 2010 6:32 AM