locked
Natural vs Surogate key please advise RRS feed

  • Question

  • Hello, I am trying to decide what is the best key for table i am going to call Locations.

    Consider having a table named Product. This table will have a key and one more column LocationID.

    What i do know is that my products have Serial Number. This number is 100% unique and will never change. This would make it a perfect candidate for a natural key.

    Next i have a Locations table. What i do know is that these locations are not something like names of countries, cities etc, but some internal descriptions of some locations. It is not guarantied they will never change, it is not known what size they can be. This table will be in relation to Product table. My question is, should i pick Location key to be these internal names, or should i pick surrogare key, and ID of autonumber, and also make Description column with unique property?

    Here are my pros of making surrogate ID key:

    1. Location is not quarantied to never change making cascading update scenarios possible (so what?)

    2. Location is not known in size, so i will have to put it like nvarchar(100) That can make impact on JOIN operations and who   knows what more.

    3. Description of Location is still unique preventing multiple inserts of same data.

    Pros for making Location as natural key:

    1. Well to be honest, it is easier to develop apps without having to implement lookup fields, comboboxs etc. and i dont need to implement JOINS all the time.

    Is it a big deal that changing natural column value will cause cascading updates, why would this be an issue considering table doesnt have like 100000 rows...

    I would go for ID autonumber here, but i am just not sure. Can someone please give an advice


    • Edited by Misha78 Thursday, October 24, 2013 5:38 PM not done
    Thursday, October 24, 2013 5:33 PM

Answers

  • Rule #1:  Never create a PK on a field which is entered by the user.

    Rule #2:  See Rule #1

    You can always make a clustered index on your serial number if that is how you search for products, but that is not a key between tables.


    • Edited by Tom PhillipsEditor Thursday, October 24, 2013 6:59 PM
    • Marked as answer by Misha78 Thursday, October 24, 2013 7:33 PM
    Thursday, October 24, 2013 6:59 PM
    Answerer

All replies

  • P.S one more thing.

    I read that you should ask questions youself prior deciding should you take natural or surrogate key

    Is it unique, not null, and unchangeable and not too big in size (like nvarchar(500)) ?

    I can understand unique and not null part, but in most cases we can't guarantie it is unchangeable. I mean if we have a table of states, cities, colors, genders etc, here we can guarantie. But almost everywhere else we really cant.

    Bussiness logic may change, and what earlier seems to be unchanged, may become changeable. This goes to favor of surrogate key in my scenario i wrote above.

    Then again, if only thing that can happen is changing value of column, and not type (like nvarchar size increase), why is it a big deal to let cascades do their work once in a while?

    Thursday, October 24, 2013 5:55 PM
  • Rule #1:  Never create a PK on a field which is entered by the user.

    Rule #2:  See Rule #1

    You can always make a clustered index on your serial number if that is how you search for products, but that is not a key between tables.


    • Edited by Tom PhillipsEditor Thursday, October 24, 2013 6:59 PM
    • Marked as answer by Misha78 Thursday, October 24, 2013 7:33 PM
    Thursday, October 24, 2013 6:59 PM
    Answerer
  • So basicly you are saying i should always make a surogate key, Id autonumber, for primary keys, since that field is maintained by SQL increment and not entered by user?
    Thursday, October 24, 2013 7:04 PM
  • Yes.  If you don't you will be sorry in the long run.

    Eventually, in your example, someone will come and say "I accidentally entered the serial number wrong".  Now it is a major fix to update all the links to that value.

    Also, seeking for a number is much faster than character comparison.

    Again, I would suggest you look at the AdventureWorks database structure.  It is done very well and takes many, many things into account in its design.

    Thursday, October 24, 2013 7:12 PM
    Answerer
  • Thank you very much.

    Best regards

    Milos

    Thursday, October 24, 2013 7:33 PM
  • To add to this - using a surrogate key does not relieve you of the responsibility of enforcing uniqueness of the natural key (or keys).  And before you go adding identity columns to every table - because you mis-interpreted the remarks above - I suggest you refrain from doing that until your analysis is complete and verified.  When your description includes anything like "... Location is not known in size, so i will have to put ...", it seems to me that your analysis is still far from complete and it is premature to be making decisions about surrogate vs. natural primary keys.  It IS important to know what the natural keys are, but the choice for a primary constraint can be deferred until you actually start planning the physical implementation. 

    In addition, you express a concern based on 100,000 rows.  That number is not considered large for a table these days.  Try not to "solve" problems do not (and may never) exist.  FWIW, there are costs and benefits to every choice.  An identity column complicates your insert logic and consumes more space.  It can simplify updates to the natural key.  It can complicate "moving" data between databases.  The values in an identity column can have gaps - is that a concern (and be certain of your answer).  The choice of a clustered index is important - something you have not yet touched on. 

    Lastly, this is a topic that comes up frequently.  Do a little searching and read all of the past discussions.  In fact, that is something you should generally do first whenever you have a question.  Many times these discussions will lead you to re-examine your question by raising issues that you have not yet considered.

    Thursday, October 24, 2013 7:39 PM
  • Thanks for your answer. I know this topic comes up frequently. I read alot of topics, they even consider this the biggest discussion meantioning two sides, surrogate all the way and natuaral keys all the way.

    Sadly users tend to describe poor meanings of location. when i ask them, they answer is like "Well it is a description, there isnt any logic pattern or something". Think of it like tagging different places in one building complex. So i have to put more then enough size nvarchar(100)

    I know surrogate keys complicate insert logic, i am aware of gaps, that doesnt trouble me. Also i dont plan to move data between databases. I just wanted somehow to draw a line when to consider which key.

    From my point of view, surogate keys are good choice whenever something is not for sure, because like Tom said you never know when it can hit you long term

    As I said in cases where you have table of country codes, colors etc, i dont see why you shouldnt use that as natural key.

    I am not concerned about cascading updates on 100,000 rows, i failed to make a point here. What i wanted to say is that I dont find it cool to cascade update all child tables just because someone spelled wrong some locations and noticed few months later. What i dont understand is is there any reason to make myself alarmed cause cascade will occur.

    Anyway, i saw in Adventure Works example, for similar thing surogate is also used, and ProductNumber is made an index.

    Thank you again for taking time to reply to my post. I have long road ahead of me

    Thursday, October 24, 2013 7:57 PM
  • >As I said in cases where you have table of country codes, colors etc, i dont see why you shouldnt use that as natural key.

    You cannot declare a NATURAL KEY in SQL since that is only a logical design concept.

    What you can declare is PRIMARY KEY and UNIQUE KEY.

    Basically in practical design INT surrogate PRIMARY KEY is declared and UNIQUE KEY or unique index on the NATURAL KEY(s).

    You can populate the INT PK with IDENTITY or with SEQUENCE object starting with SQL Server 2012. You can do your own INT surrogate key population using the MAX function.

    As far as using the country name (varchar(100)?) as PK and FK, it is not as efficient as 4-byte INT space wise and performance wise. Country names are changing also: where are Czechoslovakia & Yugoslavia? Soviet Union?


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012


    Sunday, October 27, 2013 8:08 AM