none
Newbies PRIMARY KEY Choice: IDENTITY Surrogate Key or Natural Key RRS feed

  • General discussion

  • My recommendation is to follow AdventureWorksxxxx sample databases which use identity surrogates for PRIMARY KEY  and place Unique Index or Unique Key constraint on Natural Key(s).  Example follows from AdventureWorks2008.

    CREATE TABLE Production.Product(
    	ProductID int IDENTITY(1,1)		PRIMARY KEY,
    	Name dbo.Name NOT NULL,		-- UNIQUE INDEX
    	ProductNumber nvarchar(25) NOT NULL,	-- UNIQUE INDEX
    	MakeFlag dbo.Flag NOT NULL,
    	FinishedGoodsFlag dbo.Flag NOT NULL,
    	Color nvarchar(15) NULL,
    	SafetyStockLevel smallint NOT NULL,
    	ReorderPoint smallint NOT NULL,
    	StandardCost money NOT NULL,
    	ListPrice money NOT NULL,
    	Size nvarchar(5) NULL,
    	SizeUnitMeasureCode nchar(3) NULL,
    	WeightUnitMeasureCode nchar(3) NULL,
    	Weight decimal(8, 2) NULL,
    	DaysToManufacture int NOT NULL,
    	ProductLine nchar(2) NULL,
    	Class nchar(2) NULL,
    	Style nchar(2) NULL,
    	ProductSubcategoryID int NULL,
    	ProductModelID int NULL,
    	SellStartDate datetime NOT NULL,
    	SellEndDate datetime NULL,
    	DiscontinuedDate datetime NULL,
    	rowguid uniqueidentifier ROWGUIDCOL NOT NULL, -- UNIQUE INDEX
    	ModifiedDate datetime NOT NULL);
    
    Kalman Toth
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016

    Saturday, January 22, 2011 2:57 PM
    Moderator

All replies

  • My recommendation is to follow AdventureWorksxxxx sample databases which use natural keys for PRIMARY KEY aligned with ISO standard codes and values when possible.  Example follows from AdventureWorks2008, which uses ISO 3166 codes (http://www.iso.org/iso/english_country_names_and_code_elements).

    CREATE TABLE [Person].[CountryRegion](
    	[CountryRegionCode] [nvarchar](3) NOT NULL,
    	[Name] [dbo].[Name] NOT NULL,
    	[ModifiedDate] [datetime] NOT NULL,
     CONSTRAINT [PK_CountryRegion_CountryRegionCode] PRIMARY KEY CLUSTERED 
    (
    	[CountryRegionCode] ASC
    );
    

     


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Saturday, January 22, 2011 3:39 PM
  • The following query lists all the PRIMARY KEY columns in AdventureWorks2008. The "ID" suffix convention indicates IDENTITY Surrogate Primary Key at the beginning of the listing. Starting with RowID 88, Code Tables, Tables with composite PK,  & Natural Key PRIMARY KEYs are listed.

    SELECT RowID = ROW_NUMBER() 
     OVER(ORDER BY REVERSE(COLUMN_NAME), TABLE_SCHEMA, TABLE_NAME), 
     * 
    FROM (SELECT TC.TABLE_SCHEMA, 
      TC.TABLE_NAME, 
      COLUMN_NAME 
     FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC 
      INNER JOIN INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE CCU 
      ON TC.CONSTRAINT_NAME = CCU.CONSTRAINT_NAME 
     WHERE CONSTRAINT_TYPE = 'PRIMARY KEY') X 
    ORDER BY RowID 
     
    
    RowID TABLE_SCHEMA TABLE_NAME COLUMN_NAME
    1 Purchasing ShipMethod ShipMethodID
    2 Sales CreditCard CreditCardID
    3 Sales PersonCreditCard CreditCardID
    4 Person StateProvince StateProvinceID
    5 Person PersonPhone PhoneNumberTypeID
    6 Person PhoneNumberType PhoneNumberTypeID
    7 Person AddressType AddressTypeID
    8 Person BusinessEntityAddress AddressTypeID
    9 Person BusinessEntityContact ContactTypeID
    10 Person ContactType ContactTypeID
    11 Production Culture CultureID
    12 Production ProductModelProductDescriptionCulture CultureID
    13 HumanResources JobCandidate JobCandidateID
    14 Sales SalesTaxRate SalesTaxRateID
    15 Sales CurrencyRate CurrencyRateID
    16 dbo DatabaseLog DatabaseLogID
    17 dbo ErrorLog ErrorLogID
    18 Production ProductModel ProductModelID
    19 Production ProductModelIllustration ProductModelID
    20 Production ProductModelProductDescriptionCulture ProductModelID
    21 Purchasing PurchaseOrderDetail PurchaseOrderDetailID
    22 Sales SalesOrderDetail SalesOrderDetailID
    23 Sales ShoppingCartItem ShoppingCartItemID
    24 Production Location LocationID
    25 Production ProductInventory LocationID
    26 dbo AWBuildVersion SystemInformationID
    27 Production Illustration IllustrationID
    28 Production ProductModelIllustration IllustrationID
    29 Production TransactionHistory TransactionID
    30 Production TransactionHistoryArchive TransactionID
    31 Production ProductDescription ProductDescriptionID
    32 Production ProductModelProductDescriptionCulture ProductDescriptionID
    33 Production ScrapReason ScrapReasonID
    34 Sales SalesOrderHeaderSalesReason SalesReasonID
    35 Sales SalesReason SalesReasonID
    36 Person BusinessEntityContact PersonID
    37 Production ProductPhoto ProductPhotoID
    38 Production ProductProductPhoto ProductPhotoID
    39 Purchasing PurchaseOrderDetail PurchaseOrderID
    40 Purchasing PurchaseOrderHeader PurchaseOrderID
    41 Production WorkOrder WorkOrderID
    42 Production WorkOrderRouting WorkOrderID
    43 Sales SalesOrderDetail SalesOrderID
    44 Sales SalesOrderHeader SalesOrderID
    45 Sales SalesOrderHeaderSalesReason SalesOrderID
    46 Sales SpecialOffer SpecialOfferID
    47 Sales SpecialOfferProduct SpecialOfferID
    48 Sales Customer CustomerID
    49 Production BillOfMaterials BillOfMaterialsID
    50 Person Address AddressID
    51 Person BusinessEntityAddress AddressID
    52 Person EmailAddress EmailAddressID
    53 Production Product ProductID
    54 Production ProductCostHistory ProductID
    55 Production ProductDocument ProductID
    56 Production ProductInventory ProductID
    57 Production ProductListPriceHistory ProductID
    58 Production ProductProductPhoto ProductID
    59 Production WorkOrderRouting ProductID
    60 Purchasing ProductVendor ProductID
    61 Sales SpecialOfferProduct ProductID
    62 HumanResources EmployeeDepartmentHistory ShiftID
    63 HumanResources Shift ShiftID
    64 HumanResources Department DepartmentID
    65 HumanResources EmployeeDepartmentHistory DepartmentID
    66 Production ProductReview ProductReviewID
    67 Production ProductSubcategory ProductSubcategoryID
    68 Production ProductCategory ProductCategoryID
    69 Sales SalesTerritory TerritoryID
    70 Sales SalesTerritoryHistory TerritoryID
    71 HumanResources Employee BusinessEntityID
    72 HumanResources EmployeeDepartmentHistory BusinessEntityID
    73 HumanResources EmployeePayHistory BusinessEntityID
    74 Person BusinessEntity BusinessEntityID
    75 Person BusinessEntityAddress BusinessEntityID
    76 Person BusinessEntityContact BusinessEntityID
    77 Person EmailAddress BusinessEntityID
    78 Person Password BusinessEntityID
    79 Person Person BusinessEntityID
    80 Person PersonPhone BusinessEntityID
    81 Purchasing ProductVendor BusinessEntityID
    82 Purchasing Vendor BusinessEntityID
    83 Sales PersonCreditCard BusinessEntityID
    84 Sales SalesPerson BusinessEntityID
    85 Sales SalesPersonQuotaHistory BusinessEntityID
    86 Sales SalesTerritoryHistory BusinessEntityID
    87 Sales Store BusinessEntityID
    88 Production WorkOrderRouting OperationSequence
    89 Production UnitMeasure UnitMeasureCode
    90 Person CountryRegion CountryRegionCode
    91 Sales CountryRegionCurrency CountryRegionCode
    92 Sales CountryRegionCurrency CurrencyCode
    93 Sales Currency CurrencyCode
    94 Production Document DocumentNode
    95 Production ProductDocument DocumentNode
    96 Sales SalesPersonQuotaHistory QuotaDate
    97 HumanResources EmployeePayHistory RateChangeDate
    98 HumanResources EmployeeDepartmentHistory StartDate
    99 Production ProductCostHistory StartDate
    100 Production ProductListPriceHistory StartDate
    101 Sales SalesTerritoryHistory StartDate
    102 Person PersonPhone PhoneNumber

    Some of the non-identity surrogate PK-s are on code (lookup ) tables:

    SELECT CurrencyCode
     ,Name
     FROM AdventureWorks2008.Sales.Currency
    /* CurrencyCode	Name
    ....
    TND	Tunisian Dinar
    TRL	Turkish Lira
    TTD	Trinidad and Tobago Dollar
    TWD	New Taiwan Dollar
    USD	US Dollar
    UYU	Uruguayan Peso
    VEB	Bolivar
    .....*/

    For a newbie using SMALLINT IDENTITY surrogate PK will do no harm. Similar code table: Person.CountryRegion.

    Some of the non-surrogate PK tables have composite PK(Production.WorkOrderRouting), outside the scope of this thread.

    Production.Document is one of the few tables with Natural Key PK. Not exactly sure why, perhaps the few rows.

    SELECT DocumentNode
      ,DocumentLevel
      ,Title
     FROM AdventureWorks2008.Production.Document
    /*
    DocumentNode	DocumentLevel	Title
    0x	0	Documents
    0x58	1	Overview
    0x5AC0	2	Introduction 1
    0x5B40	2	Repair and Service Guidelines
    0x68	1	Maintenance
    0x6AC0	2	Crank Arm and Tire Maintenance
    0x6B40	2	Lubrication Maintenance
    0x78	1	Assembly
    0x7AC0	2	Front Reflector Bracket and Reflector Assembly 3
    0x7B40	2	Front Reflector Bracket Installation
    0x7BC0	2	Installing Replacement Pedals
    0x7C20	2	Seat Assembly
    0x7C60	2	Training Wheels 2 */


    Kalman Toth
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016

    Saturday, January 22, 2011 4:19 PM
    Moderator
  • The following query lists all the PRIMARY KEY columns in AdventureWorks2008. The "ID" suffix convention indicates IDENTITY Surrogate Primary Key at the beginning of the listing. Starting with RowID 88, Natural Key PRIMARY KEYs are listed.


    That query shows all the primary key columns.  But the ID suffix only indicates that the column is an identity in some table, not necessarily the one referenced.  For example, BusinessEntityID is listed as being in the primary key of 12 tables, but it is an identity in only 1 table (Person.BusinessEntity).

    If the question is what percentage of the tables in AdventureWorks2008 have an identity column and the identity column is the primary key of that table, the answer is almost exactly 1/2.  There are 36 tables which have an identity column which is the primary key, 35 tables where the primary key is not an identity column in that table, and 2 heaps.

    With cte As
    (Select s.name As SchemaName, 
     t.Name As TableName,
     i.Name As PrimaryKeyName,
     Case When Count(*) = 1 And Min(Cast(c.is_identity As tinyint)) = 1 Then 'Yes'
       When Min(i.type_desc) = 'HEAP' Then 'Heap'
       Else 'No' End As HasIdentityColumnAsPK
    From sys.indexes i
    Inner Join sys.index_columns ic On i.object_id = ic.object_id And i.index_id = ic.index_id
    Inner Join sys.columns c On ic.object_id = c.object_id And ic.column_id = c.column_id
    Inner Join sys.tables t On i.object_id = t.object_id
    Inner Join sys.schemas s On t.schema_id = s.schema_id
    Where i.is_primary_key = 1
     And i.type_desc <> 'HEAP'
    Group By s.name, t.name, i.name
    Union All
    Select s.name As SchemaName, 
     t.Name As TableName,
     i.Name As PrimaryKeyName,
     'Heap'
    From sys.indexes i
    Inner Join sys.tables t On i.object_id = t.object_id
    Inner Join sys.schemas s On t.schema_id = s.schema_id
    Where i.type_desc = 'HEAP')
    Select Row_Number() Over (Order By HasIdentityColumnAsPK Desc, SchemaName, TableName, PrimaryKeyName) As RowNbr,
     SchemaName, TableName, PrimaryKeyName, HasIdentityColumnAsPK
    From cte
    Order By RowNbr;
    
    /*
    1	dbo	AWBuildVersion	PK_AWBuildVersion_SystemInformationID	Yes
    2	dbo	DatabaseLog	PK_DatabaseLog_DatabaseLogID	Yes
    3	dbo	ErrorLog	PK_ErrorLog_ErrorLogID	Yes
    4	HumanResources	Department	PK_Department_DepartmentID	Yes
    5	HumanResources	JobCandidate	PK_JobCandidate_JobCandidateID	Yes
    6	HumanResources	Shift	PK_Shift_ShiftID	Yes
    7	Person	Address	PK_Address_AddressID	Yes
    8	Person	AddressType	PK_AddressType_AddressTypeID	Yes
    9	Person	BusinessEntity	PK_BusinessEntity_BusinessEntityID	Yes
    10	Person	ContactType	PK_ContactType_ContactTypeID	Yes
    11	Person	PhoneNumberType	PK_PhoneNumberType_PhoneNumberTypeID	Yes
    12	Person	StateProvince	PK_StateProvince_StateProvinceID	Yes
    13	Production	BillOfMaterials	PK_BillOfMaterials_BillOfMaterialsID	Yes
    14	Production	Illustration	PK_Illustration_IllustrationID	Yes
    15	Production	Location	PK_Location_LocationID	Yes
    16	Production	Product	PK_Product_ProductID	Yes
    17	Production	ProductCategory	PK_ProductCategory_ProductCategoryID	Yes
    18	Production	ProductDescription	PK_ProductDescription_ProductDescriptionID	Yes
    19	Production	ProductModel	PK_ProductModel_ProductModelID	Yes
    20	Production	ProductPhoto	PK_ProductPhoto_ProductPhotoID	Yes
    21	Production	ProductReview	PK_ProductReview_ProductReviewID	Yes
    22	Production	ProductSubcategory	PK_ProductSubcategory_ProductSubcategoryID	Yes
    23	Production	ScrapReason	PK_ScrapReason_ScrapReasonID	Yes
    24	Production	TransactionHistory	PK_TransactionHistory_TransactionID	Yes
    25	Production	WorkOrder	PK_WorkOrder_WorkOrderID	Yes
    26	Purchasing	PurchaseOrderHeader	PK_PurchaseOrderHeader_PurchaseOrderID	Yes
    27	Purchasing	ShipMethod	PK_ShipMethod_ShipMethodID	Yes
    28	Sales	CreditCard	PK_CreditCard_CreditCardID	Yes
    29	Sales	CurrencyRate	PK_CurrencyRate_CurrencyRateID	Yes
    30	Sales	Customer	PK_Customer_CustomerID	Yes
    31	Sales	SalesOrderHeader	PK_SalesOrderHeader_SalesOrderID	Yes
    32	Sales	SalesReason	PK_SalesReason_SalesReasonID	Yes
    33	Sales	SalesTaxRate	PK_SalesTaxRate_SalesTaxRateID	Yes
    34	Sales	SalesTerritory	PK_SalesTerritory_TerritoryID	Yes
    35	Sales	ShoppingCartItem	PK_ShoppingCartItem_ShoppingCartItemID	Yes
    36	Sales	SpecialOffer	PK_SpecialOffer_SpecialOfferID	Yes
    37	HumanResources	Employee	PK_Employee_BusinessEntityID	No
    38	HumanResources	EmployeeDepartmentHistory	PK_EmployeeDepartmentHistory_BusinessEntityID_StartDate_DepartmentID	No
    39	HumanResources	EmployeePayHistory	PK_EmployeePayHistory_BusinessEntityID_RateChangeDate	No
    40	Person	BusinessEntityAddress	PK_BusinessEntityAddress_BusinessEntityID_AddressID_AddressTypeID	No
    41	Person	BusinessEntityContact	PK_BusinessEntityContact_BusinessEntityID_PersonID_ContactTypeID	No
    42	Person	CountryRegion	PK_CountryRegion_CountryRegionCode	No
    43	Person	EmailAddress	PK_EmailAddress_BusinessEntityID_EmailAddressID	No
    44	Person	Password	PK_Password_BusinessEntityID	No
    45	Person	Person	PK_Person_BusinessEntityID	No
    46	Person	PersonPhone	PK_PersonPhone_BusinessEntityID_PhoneNumber_PhoneNumberTypeID	No
    47	Production	Culture	PK_Culture_CultureID	No
    48	Production	Document	PK_Document_DocumentNode	No
    49	Production	ProductCostHistory	PK_ProductCostHistory_ProductID_StartDate	No
    50	Production	ProductDocument	PK_ProductDocument_ProductID_DocumentNode	No
    51	Production	ProductInventory	PK_ProductInventory_ProductID_LocationID	No
    52	Production	ProductListPriceHistory	PK_ProductListPriceHistory_ProductID_StartDate	No
    53	Production	ProductModelIllustration	PK_ProductModelIllustration_ProductModelID_IllustrationID	No
    54	Production	ProductModelProductDescriptionCulture	PK_ProductModelProductDescriptionCulture_ProductModelID_ProductDescriptionID_CultureID	No
    55	Production	ProductProductPhoto	PK_ProductProductPhoto_ProductID_ProductPhotoID	No
    56	Production	TransactionHistoryArchive	PK_TransactionHistoryArchive_TransactionID	No
    57	Production	UnitMeasure	PK_UnitMeasure_UnitMeasureCode	No
    58	Production	WorkOrderRouting	PK_WorkOrderRouting_WorkOrderID_ProductID_OperationSequence	No
    59	Purchasing	ProductVendor	PK_ProductVendor_ProductID_BusinessEntityID	No
    60	Purchasing	PurchaseOrderDetail	PK_PurchaseOrderDetail_PurchaseOrderID_PurchaseOrderDetailID	No
    61	Purchasing	Vendor	PK_Vendor_BusinessEntityID	No
    62	Sales	CountryRegionCurrency	PK_CountryRegionCurrency_CountryRegionCode_CurrencyCode	No
    63	Sales	Currency	PK_Currency_CurrencyCode	No
    64	Sales	PersonCreditCard	PK_PersonCreditCard_BusinessEntityID_CreditCardID	No
    65	Sales	SalesOrderDetail	PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID	No
    66	Sales	SalesOrderHeaderSalesReason	PK_SalesOrderHeaderSalesReason_SalesOrderID_SalesReasonID	No
    67	Sales	SalesPerson	PK_SalesPerson_BusinessEntityID	No
    68	Sales	SalesPersonQuotaHistory	PK_SalesPersonQuotaHistory_BusinessEntityID_QuotaDate	No
    69	Sales	SalesTerritoryHistory	PK_SalesTerritoryHistory_BusinessEntityID_StartDate_TerritoryID	No
    70	Sales	SpecialOfferProduct	PK_SpecialOfferProduct_SpecialOfferID_ProductID	No
    71	Sales	Store	PK_Store_BusinessEntityID	No
    72	dbo	DatabaseLog	NULL	Heap
    73	Production	ProductProductPhoto	NULL	Heap
    */
    
    Tom

    Saturday, January 22, 2011 6:53 PM
  • Thanks Tom. Do you have a query to show how many Natural Key PK tables? Let's not count the Code tables (like currency) into this category. Thanks.


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM
    Saturday, January 22, 2011 7:00 PM
    Moderator
  • > Thanks Tom. Do you have a query to show how many Natural Key PK tables? Let's not count the Code tables (like currency) into this category. Thanks.

    If the critera is that we should not count tables with natural keys, there are of course zero tables with natural keys.

    A currency code is a prime example of a natural key and a prime example when it would be a bad idea to have a surrogate key.

    I don't have AW2008 installed, but I guess that if there are 35 tables without an IDENTITY column, there are 35 tables with natural PKs, although it is possible that some have guids or roll-your-own surrogate keys.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Saturday, January 22, 2011 10:30 PM
  • I don't have AW2008 installed, but I guess that if there are 35 tables without an IDENTITY column, there are 35 tables with natural PKs, although it is possible that some have guids or roll-your-own surrogate keys.

    Well, some have composite Primary Keys.

    I found one which Natural Key PK: Production.Document. The PK is hierarchyid data type.

    I am fine with using CurrencyCode (USD, EUR) as PK for Natural Key Currency (US Dollar, EURO). Some may argue though for numeric surrogate(http://aspdotnetcode.source-of-humor.com/Articles/Oracle/HowToConvertRowDataToColumnsPivotingTranspose.aspx).

    Nonetheless note that all the sizable (excluding code/lookup)  transactional tables like Sales.SalesOrderHeader have surrogate PK-s.


     


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM
    Saturday, January 22, 2011 11:00 PM
    Moderator
  • I would be hesitant to use AdventureWorks as an example for best
    practices. Don't forget that this database has been primarily designed
    (and changed with each new version of SQL Server) to demonstrate the
    various features SQL Server has.

    The term surrogate key implies that it is a surrogate for something.
    That something would then be the natural key. So that implies that you
    suggest that newbies design all tables with a natural key plus a
    surrogate key instead of with a natural key only. I fail to see how
    adding an extra candidate key makes life easier for newbies.

    Judging by the many posts I see here and in other online venues where
    people have tables with only an IDENTITY primary key and no other
    candidate key, I'm afraid your advise would be taken to endorse that
    notion. I know you don't mean it that way - but do they?

    What about: "Newbies: Take a design class before you create a
    database"?


    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
    Sunday, January 23, 2011 12:57 AM
  • Kalman,

    Check 3 blogs by Kimberly Tripp on this topic referenced in answers here

    In addition, you may check last 2 answers here with a link to Tony Rogerson blog. 

     


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Sunday, January 23, 2011 1:58 AM
    Moderator
  • Thanks Tom. Do you have a query to show how many Natural Key PK tables? Let's not count the Code tables (like currency) into this category. Thanks.


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM


    I'm not sure how you would do that because that would require a definition of a Code table. 

    Also, in a reply to Erland, you pointed out that in AdventureWorks2008 only small tables have natural keys as the primary key.  But that wouldn't be true in general, for example, a 300 year Calendar table will have nearly 100,000 rows, but I would argue the primary key of a calendar table should be the date, and definitely not a surrogate identity column. 

    Tom

    Sunday, January 23, 2011 2:10 AM
  • Kalman,

    Check 3 blogs by Kimberly Tripp on this topic referenced in answers here

    Thanks Naomi. I agree with the following quote from the link:

    "Yes, using a INT (or BIGINT) IDENTITY is very good practice for SQL Server.

    SQL Server uses the primary key as its default clustering key, and the clustering key should always have these properties:

    • narrow
    • static
    • unique
    • ever-increasing

    INT IDENTITY fits the bill perfectly!"

    I always design with INT IDENTITY(1,1) PRIMARY KEY. I understand though that some database experts prefer Natural Key as PK in particular applications.

     In the following example, Natural Key as PK would be a DISASTER:

    CREATE TABLE Email ( 
     EmailID   INT  IDENTITY ( 1 , 1 )  PRIMARY KEY, -- Surrogate key 
     Email    VARCHAR(70)  UNIQUE, -- Natural Key 
     SourcePageID INT, 
     CaptureDate DATETIME  DEFAULT getdate()); 

    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM
    Sunday, January 23, 2011 12:00 PM
    Moderator
  • > But that wouldn't be true in general, for example, a 300 year Calendar table will have nearly 100,000 rows, but I would argue the primary key of a calendar table should be the date, and definitely not a surrogate identity column. 

    That could be fun. Particulary if the surrogate id is not correlated with the dates.

    What can be useful in a calendar table, though, is to assign numbers to the dates. Here is a calendars table that I have:

    CREATE TABLE calendars (
       coucode     aba_coucode  NOT NULL,
       thedate     aba_date     NOT NULL,
       isholiday   smallint     NOT NULL,
       fwddayno    int          NOT NULL,
       backdayno   int          NOT NULL,
       CONSTRAINT pk_cal PRIMARY KEY NONCLUSTERED (coucode, thedate),
       CONSTRAINT ak_cal UNIQUE NONCLUSTERED (coucode, fwddayno, isholiday),
       CONSTRAINT ak2_cal UNIQUE NONCLUSTERED (coucode, backdayno, isholiday),
       CONSTRAINT ckt_cal_backandfwd CHECK
           (isholiday = 0 AND fwddayno = backdayno OR
            isholiday <> 0 AND fwddayno + 1 = backdayno)
    )

    In the system I work with we often need to compute the settledate given the tradedate or vice versa, and you can do this by taking one date, get the day number add that many days, and then find the date you landed on.

    The triggers to maintain these numbers when a new non-business day is entered are fairly ugly.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Sunday, January 23, 2011 12:04 PM
  • Kalman,

    ....

    In addition, you may check last 2 answers here with a link to Tony Rogerson blog. 

    Naomi, I agree with the quote from the link:

    "Identity Column Surrogate Primary Keys

    A surrogate key is assigned by computer and typically has no meaning to humans. Within SQL Server, surrogate keys are identity columns or global unique identifiers.

    By far, the most popular method for building primary keys involves using an identity column. Like an auto-number column or sequence column in other databases, the identity column generates consecutive integers as new rows are inserted into the database. Optionally, you can specify the initial seed number and interval.

    Identity columns offer two advantages:

    Integers are easier to manually recognize and edit than GUIDs.

    Integers are small and fast."

    Why should not the SSNo be used in the Employee table as Primary Key?  After all you cannot find more Natural Key than the SSNo? Here is one reason: data security. You may even have to encrypt the SSNo column!

    In a normalized database, you aim to reduce data redundancy to a minimum, possibly none. Using Natural Key as PK increases data redundancy by introducing FK with repeating data in other tables.

     


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM
    Sunday, January 23, 2011 12:32 PM
    Moderator
  • Why should not the SSNo be used in the Employee table as Primary Key?  After all you cannot find more Natural Key than the SSNo? Here is one reason: data security. You may even have to encrypt the SSNo column!

    Since I don't live in the US, I don't know how your SSN works, but as I understand as US SSN is a nine-digit number which appears to me a tad short to identify a person in a country with a nine-digit population. And judging from the name, it cannot be used to identify juridical persons, which further limits the usability.

    Then again, SSN might work for an Employee tables in a decently sized company. You probably don't want to accept multiple SSN - which reportedly are commonplace - and you may also require everyone to have one.

    What's the moral of this? That data modelling by no means is a simple business. You have to know the data domain. Even more so a reason to work with natural keys from the beginning, so that you learn the trade properly.

    In a normalized database, you aim to reduce data redundancy to a minimum, possibly none. Using Natural Key as PK increases data redundancy by introducing FK with repeating data in other tables.


    Putting the same SSN in umpteen tables is no breach of any form of normalisation. However, if you want to support someone changing is SSN, this is a good reason to use a surrogate, since it's a good idea that keys should be immutable. 


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Sunday, January 23, 2011 4:06 PM
  • My recommendation is to follow AdventureWorksxxxx sample databases which use natural keys for PRIMARY KEY aligned with ISO standard codes and values when possible.  Example follows from AdventureWorks2008, which uses ISO 3166 codes (http://www.iso.org/iso/english_country_names_and_code_elements).

    CREATE TABLE [Person].[CountryRegion](	[CountryRegionCode] [nvarchar](3) NOT NULL,
    	[Name] [dbo].[Name] NOT NULL,
    	[ModifiedDate] [datetime] NOT NULL,
     CONSTRAINT [PK_CountryRegion_CountryRegionCode] PRIMARY KEY CLUSTERED 
    
    (
    
    	[CountryRegionCode] ASC
    
    );
    
    

     


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/


    What is the downside if the newbie designs with IDENTITY surrogate? Thanks.

    CREATE TABLE CountryRegion(
      CountryID INT IDENTITY(1,1)				PRIMARY KEY, 
    	CountryRegionCode nvarchar(3) NOT NULL	UNIQUE,
    	Country dbo.Name NOT NULL				UNIQUE,
    	ModifiedDate datetime NOT NULL );

     


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM

    Monday, January 24, 2011 10:04 PM
    Moderator
  • What is the downside if the newbie designs with IDENTITY surrogate?


    CREATE TABLE CountryRegion( ID INT IDENTITY(1,1) PRIMARY KEY, CountryRegionCode nvarchar(3) NOT NULL UNIQUE, Name dbo.Name NOT NULL UNIQUE, ModifiedDate datetime NOT NULL );

    That is not a newbie design. The newbie makes all three columns after ID nullable, and he does not design them as unique. If you come this far, you have come a long way.

    However, there are two reasons why IDENTITY is not suitable here. First, IDENTITY is really only called for in tables with insert frequency. The power of IDETNITY is that it permits concurrent high inserts without lock on a serialising resource. For a table like this, I think "roll-your-own" is better.

    However, in this specific case, ISO 8601 does not only define two-letter and three-letter codes for all countries, but also a number. So if you want use a numeric id for countries to avoid character keys, this is the number to use.

    Admittedly, both these deliberations goes beyond the newbie stage, not the least the second. However, it is not uncommon to see unexperienced posters running into problems just because they used IDENTITY. They want to have contiguous numbers (not always for well-founded reasons), or they want to control the numbers explicitly, for instance when they copy a table.

    It is also worth mentioning that the game is changing. The next version of SQL Server brings us sequences, a feature available in Oracle, and also in the ANSI standard. Sequences are probably a little more difficult to use, at least initially, but they don't have many of the quirks that come with IDENTITY.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Monday, January 24, 2011 10:15 PM
  • The country codes are an ISO Standard which is embedded in many other ISO Standards. It is maintained by an external trusted source. It is universal. It is a logical construct.  It is easy to read since it is an abbreviation code.  IDENTITY is the physical count of insertion attempts on ONE installation of ONE product on ONE machine in the universe. And meta-data about the updating has no place in a correct schema design. 

    CREATE TABLE Country_Codes -- iso-3166 alpha-3
    (country_code CHAR(3) NOT NULL PRIMARY KEY,
    country_name VARCHAR(15) NOT NULL); 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
    Monday, January 24, 2011 11:17 PM
  • The SSN is a tax identification number that has to be used to report to the Federal Government. It stinks as an identifier on a national scale. There is no check digit for a start, so honest errors are easy to make. We have a huge illegal worker population that make up their own SSNs or steal them. You can do some simple validation on them (they break into three parts, with some rules).

    You have to contact the Feds to actually be sure that then person is who they say they are. That is the trusted source and we are starting to crack down on violators as part of the anti-legal migrant frenzy here. 

    If you are the victim of identity theft or go into witness protection, you can get a new SSN, but this is very rare. 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
    Monday, January 24, 2011 11:35 PM
  • The country codes are an ISO Standard which is embedded in many other ISO Standards. It is maintained by an external trusted source. It is universal. It is a logical construct.  It is easy to read since it is an abbreviation code.

    That can be disputed. "US" for United States is easy. Here are some arbitrary codes:

    "GB" --> United Kingdom (Actually it is derived from Great Britain. I am still confused about the difference between the two usages.)

    "CS" --> Serbia and Montenegro

    Some of the mismatches:

    SELECT TOP 1000 [CountryRegionCode]
       ,[Name]
    FROM [AdventureWorks2008].[Person].[CountryRegion]
    WHERE LEFT(CountryRegionCode,1) != LEFT(Name,1)
    ORDER BY Name
    

    CountryRegionCode Name
    DZ Algeria
    IO British Indian Ocean Territory
    KH Cambodia
    KY Cayman Islands
    TD Chad
    KM Comoros
    HR Croatia
    SV El Salvador
    GQ Equatorial Guinea
    GF French Guiana
    PF French Polynesia
    TF French Southern and Antarctic Lands
    DE Germany
    YT Mayotte
    FM Micronesia
    AN Netherlands Antilles
    KP North Korea
    MP Northern Mariana Islands
    KN Saint Kitts and Nevis
    LC Saint Lucia
    PM Saint Pierre and Miquelon
    VC Saint Vincent and the Grenadine
    WS Samoa
    CS Serbia and Montenegro
    ZA South Africa
    GS South Georgia and the South Sandwich Islands
    ES Spain
    LK Sri Lanka
    CH Switzerland
    AE United Arab Emirates
    GB United Kingdom

     


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM
    Tuesday, January 25, 2011 1:01 AM
    Moderator
  • Although I get your point that these aren't quite as simple and logical as Joe indicated, there are good reasons for the differences, and I don't think you'll find a better source than this standard.  

    Countries change names - Samoa was called Western Samoa until 1997, hence the WS abbreviation.  GB and UK are equally applicable for the United Kingdom of Great Britain.  Others are in non-English languages, such as CH (Switzerland) being the Latin name of the Swiss Federation (Confoederatio Helvetica), and Spain (ES) being España.

    Still others are simply due to two-letter abbreviations not being verbose enough to convey meaning and avoid conflicts (SL is Sierra Leone, not Sri Lanka).

    AdventureWorks is out of date for CS - in 2006, Serbia and Montenegro were split up into RS and ME respectively.

    Tuesday, January 25, 2011 3:33 AM
    Answerer
  • Countries change names - Samoa was called Western Samoa until 1997, hence the WS abbreviation.  GB and UK are equally applicable for the United Kingdom of Great Britain.  Others are in non-English languages, such as CH (Switzerland) being the Latin name of the Swiss Federation (Confoederatio Helvetica), and Spain (ES) being España.

    Still others are simply due to two-letter abbreviations not being verbose enough to convey meaning and avoid conflicts (SL is Sierra Leone, not Sri Lanka).

    In addition there are also three-letter codes, which obviously are somewhat more mnemonic than the two-letter ones.

    AdventureWorks is out of date for CS - in 2006, Serbia and Montenegro were split up into RS and ME respectively.

    Yup. Although that also proves that counrty codes are not entirely static. Codes do change due to entity changes like the one above, but may also change because a country changes it names. For instance, the code for Congo-Kinshasa changed when they changed the name from Zaire.

    Another problem may be with territories you need to capture in your database but which for one reason or another does not have a country code, for instance Kosovo and Somaliland. On the other hand, ISO 3166 (I used the wrong ISO number in a previous post), reserves some codes for custom use, so this is not a major problem.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    Tuesday, January 25, 2011 8:37 AM
  • What is the downside if the newbie designs with IDENTITY surrogate? Thanks.

    Consider referencing tables.  With an IDENTITY surrogate, one must always access the CountryRegion table to determine the needed CountryID.  For example, the following lists US sales territories:

    SELECT st.Name AS TerritoryName
    FROM Sales.SalesTerritory st
    JOIN Person.CountryRegion cr ON
    	cr.CountryID = st.CountryID
    WHERE cr.CountryRegionCode = N'US';
    

    In contrast, one can forgo both the join and additional index on CountryID using the natural key as the primary one.  Keep in mind that the CountryRegionCode value will be consistent throughout the entire enterprise than specific to individual databases.  This simplifies ETL and replication.

    SELECT st.Name AS TerritoryName
    FROM Sales.SalesTerritory st
    WHERE st.CountryRegionCode = N'US';
    

    Newbies are especially especialy prone to forgetting the unique constraint on the natural key.  This is never a problem when the natural key is used.

     

     


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Tuesday, January 25, 2011 1:45 PM
  •  We have a huge illegal worker population that make up their own SSNs or steal them. You can do some simple validation on them (they break into three parts, with some rules).

    I am with President Reagan on the illegal immigrants issue: we should give them amnesty and make them citizens with SSN so that our Social Security Trust Fund will not run out of money in the foreseeable future.

     

     Newbies are especially especialy prone to forgetting the unique constraint on the natural key.

    That is true. Because it "works" until the break (dups in Natural Key) is discovered.

     

     The power of IDETNITY is that it permits concurrent high inserts without lock on a serialising resource. For a table like this, I think "roll-your-own" is better.

     

    I find INT (SMALLINT, TINYINT) IDENTITY convenient even for small code/lookup tables. Although if code/lookup tables start mushrooming I prefer the master-detail table solution: LookupHeader, LookupDetail.

     


    Kalman Toth, Business Intelligence Developer; SQL 2008 GRAND SLAM

     

    Tuesday, January 25, 2011 7:04 PM
    Moderator
  • I wonder if anyone could give me and opinion on this one:

    I have this client table that would have no more than 100 records

    CREATE TABLE Client( 
    client_name VARCHAR(50) PRIMARY KEY, -- Natural Key
    email VARCHAR(100) UNIQUE,
    ...
    )
    
    
    

    The client_name will be the foreign key of other tables that will have a few thousands of records.

    Would this be a good use of the natural key as the primary key ?

     

    Thanks

     

    Wednesday, February 9, 2011 2:09 AM
  • From a space/performance point of view, creating a surrogate tinyint (255 values) would take up an additional byte in each row for your client table.  However, consider what your other tables would reference.  If you used the Natural key, then each row will have a varchar(50), which will be 3-53 bytes long, and the number of rows may be quite large.  Using the surrogate tinyint, you're only adding an overhead of an additional byte.

    This overhead works out to be an additional 50 MB (approximately, worst case, assuming maximum length client names) per million rows, so it's not a huge difference.  However, also consider any other indexes that might include this column.

    On a small database, it wouldn't make much difference either way in terms of space and performance.  It may simplify queries (you won't need to join onto the Client table when looking up a specific client name), but you will need to be sure that the client name is always going to be unique.

    If you have much larger tables in the database which do have surrogate keys, I would tend to go for a similar design to ensure consistency across the entire database.

    Wednesday, February 9, 2011 2:44 AM
    Answerer
  • SQLUSA,

    I'm not sure what you're infatuation with newbies is. This is the 2nd thread (that I've noticed) where you reference them.

    Newbies don't jump into a DBA role and in most cases, the developer writes the query and passes it on to the DBA to execute especially in larger companies.

    Even junior DBA's walk into an aged enviroment where the database is already built and applications point to the tables. Any changes in schema can be devastating even catastrophic.

    New database development is a 1 in a million shot and even in new development, the database isn't always designed first, although it should be, and the result is an ad hoc design all the way around.

    However, if you're fortunate enough to have the control to design from scratch, avoiding denormalization would be my first peice of advice to a newbie or overnormalization. Applying indexes always seems to be a later task because until reporting rolls around, you never really know which fields will be query'd the most.

    Ok I'm done babbling,

    Adam


    Ctrl+Z
    Wednesday, February 9, 2011 3:27 AM
  • On a small database, it wouldn't make much difference either way in terms of space and performance.  It may simplify queries (you won't need to join onto the Client table when looking up a specific client name), but you will need to be sure that the client name is always going to be unique.

    If you have much larger tables in the database which do have surrogate keys, I would tend to go for a similar design to ensure consistency across the entire database.

    So the overhead in size is insignificant for a small database. My tables do not have more than 10K records each. I wonder how it affects in performance when you use the foreign key for some queries.

    Consistency is something I had not considered, since I have other tables using surrogate keys.

    Thanks JimMccleod

    Wednesday, February 9, 2011 4:20 AM