locked
Confusion on Natural Key Primary, vs Surrogate Key Primary RRS feed

  • Question

  • Hello everyone. I am pretty confused on whether I should be adding an autoIncremeted Primary key (surrogate key) to my table, or simply defining a primary Key on my 6 column Natural key in my specific instance.   I want to give some background as to why the confusion and some information on my data-base table and I am hoping people can comment.

    Background on Table Purpose:

    My company receives files from clients that have ~200 columns.  I created one very large data-table (I know I could theoreticallysplit it out into multiple tables, let's ignore that for now). The table now has 14 million + records.  The Natural Key on the table are the following columns.

    LoanNumber, Module, Data-month, LoanServicer,ClientName,LoadedDate.   (Note LoadedDate IS party of the key, since I can get the same record on multiple days, and I want to store all records so the LoadedDate is included to make the row unique).

    Every single query that is going to be run against the database will use 3-5 of the Natural Key columns.  They will be queries such as "select * from population  where Datamonth='8/31/2015' and module='ABC' and ClientName='Client #1' and LoanServicer='Servicer1".    

    The table gets updated with ~1 million rows a month, which happens in 100-200k records submitted per transaction.  Meaning I have 5-10 transactions a month generating new records.  

    Because of this fact I thought I should define a PRIMARY KEY (NON clustered)  on these columns.  It is my understanding that a Primary Key constraint will require the columns to be unique and Not Null (which I need) and also add indexes to the columns.  I then added an Auto-incremented (Identity) Column to the database and made that the clustered index. (Since it is an IDENTITY  column it is always increasing and a good candidate for the Clustered Index).    However from reading online, I have gotten the impression that your Clustered index should also Be your Primary key.... Is this true?  I don't see why they have to relate to each-other at all?  

    My Questions are....

     #1 Should I make my Identity column my primary key and remove it from my Natural Keys?    (Aka should I have a surrogate key by definition)

    #2 If I do this I still want to maintain Uniqueness on my natural keys... How should I do this?  Should I simply make a Not Null Unique constraint?  

    #3  Zero queries will access this database through the now "surrogate" key (the auto-incremented natural key)....  They will all use the natural key columns... So do I need to add regular indexes back onto the Natural Key columns?   If so what type of indexes should I include on them?

    #4 What's the point of a clustered index on an auto-incremented column if that column will Never be used in queries?   I know it is "not good practice..." but what if I put a clustered index on the 6 natural key columns (5 of which are large varchars() )....  Since a clustered index helps reads from the table, wouldn't this be the best option?  It would slow down inserts some (and cause index fragmentation), but if i'm only doing inserts 5-10 times a month is this really a problem?  Couldn't I just re-build the index nightly during system off hours?

    I appreciate all of your help and guidance!




    • Edited by Bloodofgods Thursday, September 10, 2015 4:27 AM
    Thursday, September 10, 2015 4:26 AM

Answers

  • 1. Read those articles

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2010/03/13/database-design-surrogate-keys-part-1-of-many-rules-for-surrogate-keys-e-f-codd-and-c-j-date-research-and-problems-they-solve.aspx

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/08/427.aspx

    2. Yes, just create an UNIQUE Constaint

    3. People create a CI in an IDENTITY to prevent fragmentation, and having an unique index on the natural keys to cover/speed up the queries


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, September 10, 2015 5:15 AM
  • The choice of "primary" key makes little or no difference. For most purposes a PRIMARY KEY constraint behaves the same as a UNIQUE constraint on the same columns.

    What does matter is what you choose as a clustered index. Typically you want a clustered index to be relatively static, highly selective and compact.

    Thursday, September 10, 2015 3:52 PM
  • Yes, if business requirements are that a combination of those columns are unique so yes, create a unique constraint on all those columns 

    Primary Key is a just a logical implementation, as you know it creates unique clustered or non clustered index so that what you need,

    You add the columns to your NCI , and make a CI as small as possible to prevent a fragmentation ..

    In my opinion  there is nothing wrong in the example you posted above even with addition columns in PK.


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, September 17, 2015 5:46 AM

All replies

  • 1. Read those articles

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2010/03/13/database-design-surrogate-keys-part-1-of-many-rules-for-surrogate-keys-e-f-codd-and-c-j-date-research-and-problems-they-solve.aspx

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/08/427.aspx

    2. Yes, just create an UNIQUE Constaint

    3. People create a CI in an IDENTITY to prevent fragmentation, and having an unique index on the natural keys to cover/speed up the queries


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, September 10, 2015 5:15 AM
  • Hi,

    Refer these links,

    http://decipherinfosys.wordpress.com/2007/02/01/surrogate-keys-vs-natural-keys-for-primary-key/

    http://www.agiledata.org/essays/keys.html


    Please Dont forget to mark as answer and Helpful Post. It helps others to find relevant posts to the same question. Milan Das

    Thursday, September 10, 2015 5:24 AM
  • The choice of "primary" key makes little or no difference. For most purposes a PRIMARY KEY constraint behaves the same as a UNIQUE constraint on the same columns.

    What does matter is what you choose as a clustered index. Typically you want a clustered index to be relatively static, highly selective and compact.

    Thursday, September 10, 2015 3:52 PM
  • Uri.  I was able to read the blogs, but they didn't really answer my question unfortunately. In the second blog you linked, Tony gives THIS exact example....

    create table client (
       id int not null IDENTITY constraint sk_client unique clustered
    ,
       client_name nvarchar(100) not null constraint pk_client primary key
    nonclustered
    )

    He has just made a unique clustered identity column and added a primary Key Nonclustered to his Natural key... Lets say his natural key was 6 columns in this table... Would he still do the same thing, and name those 6 columns as the Primary Key columns?

    This is what I did in my database design. (Exactly this).  However online a LOT of people seem to think this is wrong, and that the ID column is the column that should be both your Clustered AND your PRIMARY key....  They not that having a multiple column primary key is 100% wrong since all of your other indexes use the Primary Key Index, and having a "wide" one causes large query performance issues.  

    People who are in favor of adding the primary key constraint to your Natural Keys (even if you put the clustered on a different identity column) say that this key is MUCH faster to search compared to regular indexes, so why waste it on the Surrogate Key...   

    How much does it matter? Who is "wrong"? That's really my question.



    • Edited by Bloodofgods Wednesday, September 16, 2015 11:17 PM
    Wednesday, September 16, 2015 10:56 PM
  • Hello navogel......

    Isn't it true tho that a primary key is the main key on your table that all other indexes run off of?  So in theory wouldn't it be better to make a clustered Identity field your Primary Key, and then simply put a regular Index on your natural key columns (if there are 5+).

    Otherwise... Let's say we add an auto incremented identity column and make it our clustered index, and then we add a 6 column PRIMARY KEY NOTCLUSTERED to our natural keys..  If we add 3 more regular indexes to this table, won't they have to use the 6 column primary key instead of the 1 column Clustered key?

    This is really the heart of my question and what I am struggling with mightly. I hear people say what you have said, and others say "no that's wrong, your primary key HAS to be small and auto increasing so make it your Surrogate Key... Having a 3+ columns primary key is always wrong".  

    Wednesday, September 16, 2015 11:02 PM
  • Yes, if business requirements are that a combination of those columns are unique so yes, create a unique constraint on all those columns 

    Primary Key is a just a logical implementation, as you know it creates unique clustered or non clustered index so that what you need,

    You add the columns to your NCI , and make a CI as small as possible to prevent a fragmentation ..

    In my opinion  there is nothing wrong in the example you posted above even with addition columns in PK.


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, September 17, 2015 5:46 AM
  • What are your typical queries look like? If you are using LoanNumber then that should be your first column of your clustered index otherwise if you are using Datamonth a lot then it should be first column in your clustered index. Create a clustered index using most searched columns within first 3 columns and so forth. You said you do very little data insert per month so do not rebuild index every night as it will take time and not necessary. As for surrogate key, I don’t see any usefulness. Good luck.
    Thursday, September 17, 2015 6:13 AM
  • Uri + Soumen. I truly appreciate your continued support and I'm getting closer.  I still have a couple of questions tho based upon your responses if you can hopefully help more :-)

    As a reminder, my table is set up as below

    Clustered Primary Key on Natural Keys of Module, Data-month, LoanServicer,ClientName,LoanNumber,LoadedDate

    Table has No surrogate key at all.....

    First some more info for Soumen + URi as to why I have to re-build the indexes every night:

    I actually never re-built the indexse on this table at all until recently when I was forced to start doing so.  I have one query in particular that usually takes ~11-15 seconds (which i'm fine with) but if I continue to insert data into the table over a month or two the query suddently jumps up to 5+ hours.    The exectuion plan for the query changes drastically.   However if I re-build the Index the time goes back down to 11-15 seconds.

    WHen I look at the index before the re-build it is only .3% fragmented, so I don't think fragmentation is the cause of the problem...  Maybe the statistics aren't being updated correctly? i'm honestly not sure.

    Soumen:  Every single query uses the 3 fields: Module, Datamonth, LoanServicer in the where clause.  That's actually why I never put a surrogate key on the table, because I never join to any other tables, and every query uses these three natural keys.   Because of this I thought a clustered Primary index on them would be best.  However others in my company have said this is a no-no.  They note you ALWAYS need an AutoIncremented Clustered Index (and they say it SHOULD be your primary Key) and I should just include my "natural keys" in a regular index...

    However I have read everywhere that a clustered Index stores data at the leaf node. So to me it seems "better" to put the clustered index on your natural keys in MY example, beaause i'm always selecting from this table by those keys, and it would drastically increase read speed (I thought).

    Uri: To confirm:  You are pretty much agreeing with others in my company in saying I should make a Clustered Auto incremented key, and then simply add indexes (Primary Key NOT CLUSTERED or just a regular Unique index, it doesn't matter you say) to all of my natural keys that queries will use to run against the table?   I appreciate your information! :-)





    Friday, September 18, 2015 2:49 PM
  • You should NEVER have clustered index on auto incremented columns. Basically you are right and they are 100% wrong. The most important question is, how is your data coverage? Your main three columns are module, dataMonth and LoanService. You need to decide first what is the first column of your clustered index. For example if you have like 10 modules, and 12 Loan Service then dataMonth is your obvious choice for first column of clustered index because it is a date and will give you better coverage. Then you decide on next column  and so forth. If you can create a unique (preferably) clustered index then you do not need any primary key etc as the index will take care of it for you. Even if it is non- unique still you will get the performance boost from clustered index.

    As for performance and index rebuild, if your table pages are fragment after many inserts then you should rebuild index as your experience already gave you the answer. Good luck.
    Friday, September 18, 2015 3:07 PM