none
can Sequential_GUID be the PK?

    Question

  • Sequential_GUID is a new feature in sql 2005, an extension to GUID.
    By getting GUID's in sequence, makes it a possible candidate for clustered Index for optimization.

    I have a situation, where the Sequential_GUID generated in a distributed env, ie multiple machines/sql-instances, and it is stored in a central DB.
    Can I use it as a Primary Key?

    Thanks, Krishna.
    Krishna
    Sunday, February 15, 2009 6:16 AM

Answers

  • From storage perspective , GUID PK should be avoided. If you have child table refering this table  or many indexes on the table then GUID will be consuming lot of storage space. You may want to consider identity as an another option (you can keep different range in different servers). Check this article for more info
     http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Sunday, February 15, 2009 4:01 PM
    Moderator
  • gvee said:

    P.S. I appreciate that a GUID won't be as efficient as an integer column for a key, but I feel it is a bit of a myth that the performance implications will be as bad as most think.


    George



    Perhaps ... it is not ... i would say it depends :) ....  i had a project in which USERS table PK (USER_ID) was a GUID and almost 60-70 table had audit columns like Created_BY/Modified_BY/Deleted_By columns. The database was not properly sized and we had to change the data type.

    When i do data modeling i choose GUID as only the last option. Perhaps i have not explored that area much.

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, February 16, 2009 1:45 PM
    Moderator
  • That is what i said it depends... When you have above said scenario, where PK is referred by almost all the tables in the db and you have size constraint, you would not go for GUID kind of datatype. Yes, you may be correct that in today world the way storage devices are evolving you may not mind 12 byte.  I still feel that as a DB architect you need to select the appropriate datatype  evenif it is saving 1 byte. Yes , you may compromise if you have some specific requirement like Replication     in the project.

    Madhu


    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, February 16, 2009 2:12 PM
    Moderator
  • Uniqueidentifier is 16 bytes, integer is 4.
    It means for a nonclustered index you can cram in 4 times more records in a page with an int, than uniqueidentifier.
    25% of storage space and ram usage.
    4 times faster.

    These numbers are not 100% accurate but you get the idea.

    Monday, February 16, 2009 3:34 PM
  • gvee said:

     The benefit of a unique identifier is that you are never going to get duplicates.


    George


    That is a myth! 

    uniqueidentifier is not based on some magical witchcraft which pulls a non-repeating 16 byte (only 8 byte is generated randomly, the other 8 byte is the network card unique address), rather a random number generator mathematical algorithm which can generate repeating numbers. It does not have a "memory". It would not know that it generated a repeat. Repeats actually happened as discussed in other threads.

    You still have to place unique index or unique key constraint on the uniqueidentifier column to ensure uniqueness.

    Kalman Toth, SQL Server 2008 Training, SSAS, SSIS, SSRS, BI: www.SQLUSA.com
    Monday, February 23, 2009 4:26 PM
    Moderator
  • gvee said:

    it takes both time and environmental factors so in theory is non-repeatable.


    George



    8 byte of the 16 byte uniqueidentifier is tied to the network card, fixed for a computer.  The random number generator deals with the remailing 8 byte on the same computer ( so we can discount the other 8 bytes).

    If you generate 2^64 (8x8 bits) random numbers on a computer, you will surely start repeating. More likely, it will repeat way before that. Since it is random, we can only make statistical prediction about repeats.

    It does not matter if you use the computer clock or one's godfather's real time geo coordinates as he drives down Highway 66 in a random number generator algorithm, it will repeat sooner or later.  In fact it is better not to use tricks, because the random number generator algorithm becomes unpredictable as far as repeats concerned.

    It is not easy to produce a uniqueidenfier repeat because 2^64 is a very large pool, namely 18,446,744,073,709,551,616 (eighteen quintillion).

    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012



    Monday, February 23, 2009 6:26 PM
    Moderator
  • I have no doubt that duplicates are possible (I studied statistics for a short time ;) but I have never seen nor heard of them in a production environment. Unique constraint FTW!
    George
    Monday, February 23, 2009 6:41 PM
    Answerer
  • George,

    I did not find the thread in which a DBA claims he/she detected duplicate guids.  Here is a related quote, followed by the link to the thread which has more links on the topic:

    "I think you'll find that most articles (especially those written by people
    with training in Computer Science or Software Engineering) will skirt the
    issue because I don't think there's an algorithm that can formally guarantee
    universal unqiueness."

    http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/af52661f-7eb5-4c73-87e8-2d9ad195e112/


    Kalman Toth, SQL Server 2008 Training, SSAS, SSIS, SSRS, BI: www.SQLUSA.com
    Monday, February 23, 2009 7:51 PM
    Moderator

All replies

  • Kris,

    One of the considerations for merge replication, is having a uniqueidentifier column with the ROWGUIDCOL property and a unique index, to identify each row during the replication process.

    I guess you are looking for something similar, right?


    AMB
    Sunday, February 15, 2009 3:29 PM
    Moderator
  • From storage perspective , GUID PK should be avoided. If you have child table refering this table  or many indexes on the table then GUID will be consuming lot of storage space. You may want to consider identity as an another option (you can keep different range in different servers). Check this article for more info
     http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Sunday, February 15, 2009 4:01 PM
    Moderator
  • Kris,
     
    Using a column with identity property and different ranges, as Madhu K Nair suggested, could be another solution, but if you really want more freedom, then you could have another column to differentiate the site where the number was generated and the primary key would be (SiteID, SequentialID), no matter which range you are using in each site.


    AMB
    Sunday, February 15, 2009 4:13 PM
    Moderator
  • Kris,

    You can do it, but if you look at it, it looks pretty ugly as primary key. You would have significant performance issue for large tables.

    Sequential GUID as primary key demo:

    use tempdb;  
    go  
     
    create table Product(  
    ProductID uniqueidentifier primary key default (newsequentialid()),  
    ProductName varchar(50),  
    ListPrice money  
    )  
    go  
     
    insert Product (ProductName,ListPrice)  
    select Name, ListPrice  
    from AdventureWorks.Production.Product  
    go  
     
    select * from Product  
    go  
     
    /* Partial results  
     
    B8446644-A0FB-DD11-8EBB-00192148042A    Adjustable Race 0.00  
    B9446644-A0FB-DD11-8EBB-00192148042A    Bearing Ball    0.00  
    BA446644-A0FB-DD11-8EBB-00192148042A    BB Ball Bearing 0.00  
    BB446644-A0FB-DD11-8EBB-00192148042A    Headset Ball Bearings   0.00  
    BC446644-A0FB-DD11-8EBB-00192148042A    Blade   0.00  
    BD446644-A0FB-DD11-8EBB-00192148042A    LL Crankarm 0.00  
    BE446644-A0FB-DD11-8EBB-00192148042A    ML Crankarm 0.00  
    BF446644-A0FB-DD11-8EBB-00192148042A    HL Crankarm 0.00  
    C0446644-A0FB-DD11-8EBB-00192148042A    Chainring Bolts 0.00  
     
    */ 

    Kalman Toth, SQL Server 2008 Training, SSAS, SSIS, SSRS, BI: www.SQLUSA.com
    Sunday, February 15, 2009 8:55 PM
    Moderator
  • Madhu K Nair said:

    From storage perspective , GUID PK should be avoided. If you have child table refering this table  or many indexes on the table then GUID will be consuming lot of storage space. You may want to consider identity as an another option (you can keep different range in different servers).



    Personally, I see no problem with having a GUID for a primary key if it suits the purpose - both an identity and a GUID are surrogate keys so can be deemed equivalent in relational terms. A primary key is used to uniquely identify rows in a table, and we use a surrogate key when no natural key is available.

    As for storage considerations, yes a GUID is larger (16 bytes) than an int (4 bytes) or bigint (8 bytes), so it is definately something you need to consider. But I don't believe that having a different range of integers is the key because there is potential for these ranges to overlap as the volume of data increases, which is another consideration you need to think about. The benefit of a unique identifier is that you are never going to get duplicates.

    Finally, as you've rightly identified, the sequential GUID is better suited for indexing than a "regular" uniqueidentifier because it will ensure that the nodes in the b-tree encounters [far] less fragmentation which makes it a better candidate for a clustered index than a non-sequential GUID.
    George
    Monday, February 16, 2009 9:06 AM
    Answerer
  • P.S. I appreciate that a GUID won't be as efficient as an integer column for a key, but I feel it is a bit of a myth that the performance implications will be as bad as most think.


    George
    Monday, February 16, 2009 9:08 AM
    Answerer
  • gvee said:

    P.S. I appreciate that a GUID won't be as efficient as an integer column for a key, but I feel it is a bit of a myth that the performance implications will be as bad as most think.


    George



    Perhaps ... it is not ... i would say it depends :) ....  i had a project in which USERS table PK (USER_ID) was a GUID and almost 60-70 table had audit columns like Created_BY/Modified_BY/Deleted_By columns. The database was not properly sized and we had to change the data type.

    When i do data modeling i choose GUID as only the last option. Perhaps i have not explored that area much.

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, February 16, 2009 1:45 PM
    Moderator
  • Would you take the same view if you had a natural key that was char(16), nchar(8) or decimal (8,2)?

    If you solved your storage issues by trimming a mere 12 bytes off one column, then I'm suprised.


    George
    Monday, February 16, 2009 1:55 PM
    Answerer
  • That is what i said it depends... When you have above said scenario, where PK is referred by almost all the tables in the db and you have size constraint, you would not go for GUID kind of datatype. Yes, you may be correct that in today world the way storage devices are evolving you may not mind 12 byte.  I still feel that as a DB architect you need to select the appropriate datatype  evenif it is saving 1 byte. Yes , you may compromise if you have some specific requirement like Replication     in the project.

    Madhu


    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, February 16, 2009 2:12 PM
    Moderator
  • Uniqueidentifier is 16 bytes, integer is 4.
    It means for a nonclustered index you can cram in 4 times more records in a page with an int, than uniqueidentifier.
    25% of storage space and ram usage.
    4 times faster.

    These numbers are not 100% accurate but you get the idea.

    Monday, February 16, 2009 3:34 PM
  • gvee said:

     The benefit of a unique identifier is that you are never going to get duplicates.


    George


    That is a myth! 

    uniqueidentifier is not based on some magical witchcraft which pulls a non-repeating 16 byte (only 8 byte is generated randomly, the other 8 byte is the network card unique address), rather a random number generator mathematical algorithm which can generate repeating numbers. It does not have a "memory". It would not know that it generated a repeat. Repeats actually happened as discussed in other threads.

    You still have to place unique index or unique key constraint on the uniqueidentifier column to ensure uniqueness.

    Kalman Toth, SQL Server 2008 Training, SSAS, SSIS, SSRS, BI: www.SQLUSA.com
    Monday, February 23, 2009 4:26 PM
    Moderator
  • The algorithm used to generate a GUID is not as random as you think either - it takes both time and environmental factors so in theory is non-repeatable.

    And I agree 100% that you still have to have a unique constraint on this column, regardless :)

    I have yet to come across any example of a repeating GUID, but if you have any links to the contrary please share as I love to learn (and be proved wrong!).


    George
    Monday, February 23, 2009 4:33 PM
    Answerer
  • gvee said:

    it takes both time and environmental factors so in theory is non-repeatable.


    George



    8 byte of the 16 byte uniqueidentifier is tied to the network card, fixed for a computer.  The random number generator deals with the remailing 8 byte on the same computer ( so we can discount the other 8 bytes).

    If you generate 2^64 (8x8 bits) random numbers on a computer, you will surely start repeating. More likely, it will repeat way before that. Since it is random, we can only make statistical prediction about repeats.

    It does not matter if you use the computer clock or one's godfather's real time geo coordinates as he drives down Highway 66 in a random number generator algorithm, it will repeat sooner or later.  In fact it is better not to use tricks, because the random number generator algorithm becomes unpredictable as far as repeats concerned.

    It is not easy to produce a uniqueidenfier repeat because 2^64 is a very large pool, namely 18,446,744,073,709,551,616 (eighteen quintillion).

    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012



    Monday, February 23, 2009 6:26 PM
    Moderator
  • I have no doubt that duplicates are possible (I studied statistics for a short time ;) but I have never seen nor heard of them in a production environment. Unique constraint FTW!
    George
    Monday, February 23, 2009 6:41 PM
    Answerer
  • George,

    I did not find the thread in which a DBA claims he/she detected duplicate guids.  Here is a related quote, followed by the link to the thread which has more links on the topic:

    "I think you'll find that most articles (especially those written by people
    with training in Computer Science or Software Engineering) will skirt the
    issue because I don't think there's an algorithm that can formally guarantee
    universal unqiueness."

    http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/af52661f-7eb5-4c73-87e8-2d9ad195e112/


    Kalman Toth, SQL Server 2008 Training, SSAS, SSIS, SSRS, BI: www.SQLUSA.com
    Monday, February 23, 2009 7:51 PM
    Moderator