locked
Should I be concerned about using INT in ID fields? RRS feed

  • Question

  • Hi,

    I'm in the process of reviewing the design of a DB that will end up supporting a high volume website. I know that one of the advantages of using UniqueIdentifier is that it's always unique but it's costlier to use it as opposed to Int or BigInt.

    When our app scales out and we end up with a SQL Server farm to support the site, should I be concerned about using BigInt in ID fields? If the ID is set by the database in an incremental fashion in a table, are there strategies to prevent duplicate IDs in SQL Server farm scenarios?

    Or should I just bite the bullet and use UniqueIdentifiers in my high volume ID fields?


    Thanks, Sam

    Friday, September 27, 2013 7:45 PM

Answers

  • Uniqueidentifier has one advantage and two problems.

    Advantage: it is probably globally unique.

    Problem1: It takes 16 bytes. If it is part of a clustered index key (and it probably is), this 16 bytes will be added not just to your CL index, but to each and every non-clustered index. So, if you have e.g. 5 indexes on a table + CL index, you add 16*6 = almost 100 bytes to each and every row of that table. For a 100 milion rows table you add 10 GB! If you choose int (4 bytes), you spend 2.5 GB, saving 7.5 GB.

    Problem2: NEWID() doesn't constantly grow. That results in frequent page splits and poor "insert" performance. You need to use NEWSEQUENTIALID() to have increasing values. But even that function sometimes produces values less than the one before.

    I'd suggest you to use INT datatype in combination with SEQUENCE that starts with server number (e.g. 1 for server 1, 2 for server 2,...) and increments in step that is larger than you'll ever have servers, eg 1000. So, on server 1 sequence would give you 1, 1001, 2001, 3001, 4001, 5001,....  On server 2 you have: 2, 1002, 2002, ...

    On the tables that will have more rows than INT could handle, use bigint with the same pattern. It takes 8 bytes, but still only half of the space uniqueidentifier would take, saving you 5 GB on 100 milion rows table.

    Uniqueidentifier would be ok for smaller tables, using NEWSEQUENTIALID for new values.


    • Edited by Vedran Kesegic Friday, September 27, 2013 8:57 PM
    • Marked as answer by imsam67 Friday, September 27, 2013 10:34 PM
    Friday, September 27, 2013 8:48 PM

All replies

  • Uniqueidentifier has one advantage and two problems.

    Advantage: it is probably globally unique.

    Problem1: It takes 16 bytes. If it is part of a clustered index key (and it probably is), this 16 bytes will be added not just to your CL index, but to each and every non-clustered index. So, if you have e.g. 5 indexes on a table + CL index, you add 16*6 = almost 100 bytes to each and every row of that table. For a 100 milion rows table you add 10 GB! If you choose int (4 bytes), you spend 2.5 GB, saving 7.5 GB.

    Problem2: NEWID() doesn't constantly grow. That results in frequent page splits and poor "insert" performance. You need to use NEWSEQUENTIALID() to have increasing values. But even that function sometimes produces values less than the one before.

    I'd suggest you to use INT datatype in combination with SEQUENCE that starts with server number (e.g. 1 for server 1, 2 for server 2,...) and increments in step that is larger than you'll ever have servers, eg 1000. So, on server 1 sequence would give you 1, 1001, 2001, 3001, 4001, 5001,....  On server 2 you have: 2, 1002, 2002, ...

    On the tables that will have more rows than INT could handle, use bigint with the same pattern. It takes 8 bytes, but still only half of the space uniqueidentifier would take, saving you 5 GB on 100 milion rows table.

    Uniqueidentifier would be ok for smaller tables, using NEWSEQUENTIALID for new values.


    • Edited by Vedran Kesegic Friday, September 27, 2013 8:57 PM
    • Marked as answer by imsam67 Friday, September 27, 2013 10:34 PM
    Friday, September 27, 2013 8:48 PM
  • Vedran,

    Thank you for your response. I really appreciate it. I particularly appreciate your suggestion about using using different sequences on different servers. Thanks again.


    Thanks, Sam

    Friday, September 27, 2013 10:34 PM