none
Why do we need PRIMARY KEY in a database?

Answers

  • Kimball says that because it's true if you're following his DW modeling method. 

    A DW fact table with only an IDENTITY key is behaves much like a heap, and in DW you rarely need to identify a single fact row.  There's no real harm in slapping on a synthetic key to a fact table, but neither is it actually useful.  It doesn't enforce anything meaningful and it's not helpful for queries.

    Fact rows are, by definition, keyed by the combination of the dimension foreign key columns.  You actually should declare a unique constraint or PK for these columns, but that will make your loads more expensive, and your queries less expensive.  If you can make that work in your DW, then do it.  Leaving off PK's and FK's that belong there in Data Warehouse is a hack to improve performance.  IMO both have been made obsolete by improvements in server hardware.

    David


    David http://blogs.msdn.com/b/dbrowne/








    Thursday, February 28, 2013 2:11 AM
  • A table needs a key.

    You need a key to uniquely identify a row.

    The Primary Key constraint is a convention to indicate that the declared column (or list of columns) constitutes a key.

    There can be other reasons why a combination of column may be unique or is declared to be unique. For example because of one of the columns in the combination is (already) unique. By convention, you should only declare the column(s) as Primary Key when it constitutes a key.

    The convention about Unique constraints is not very strong. They are used in any situation where a column (or list of columns) needs to be unique from a modelling point of view, regardless of the reason. It could be a (candidate) key, but doesn't have to be.

    Similarly for unique indexes. For example, a compound index that includes the primary key can be declared as unique without any danger, because the primary key part will already guarantee uniqueness. But obviously a compound index that includes the primary key shouldn't be considered a key, since a key should not have any nonessential information.


    Gert-Jan

    Thursday, January 17, 2013 9:07 PM
  • AFAIK, the only implementation differences are:

    1)  Primary key columns must be not null.  If you define the primary key constraint as part of the CREATE TABLE statement, the columns default to not null.  If you define the primary key constraint sometime after the table is created and any of the columns are nullable, you will get an error.  Unique constraints can have nullable columns in the key.

    2) Primary key constraints default to clustered unless the table already has a clustered index.  Unique constraints default to nonclustered.

    3) Foreign key constraints can reference either a primary key or a unique constraint.  But in order for a foreign key constraints to reference a unique constraint, the columns in the parent table unique constraint must be used in the foreign key constraint definition.

    4) The type and type_desc columns in the sys.objects table (and other metadata views) have different values for primary key and unique constraints.

    5) If you post a table definition in this forum and the table definition has only unique constraints and no primary key constraint, Joe Celko will write a post pointing out the error of your ways <grin>.

    In actual practice, if I have multiple candidate keys (like, for example a surrogate key and a natural key), I choose the one that will be used for foreign key constraints (like the surrogate key) as the primary key and the other(s) get unique constraints.

    Of course, you can also declare unique indexes. They are implemented much like unique constraints except they do not get an entry in sys.objects.  Also, unique indexes can have included columns.  Unique constraints cannot have included columns.

    In actual practice, I make candidate keys that I didn't choose for the primary key into unique constraints.  A rare exception to that rule is if I want included columns with that candidate key.  If I want an index on a set of columns that I know must be unique, but are not a candidate key (because it includes all the columns in a candidate key plus some other columns), I make that a unique index, not a unique constraint.  Essentially, I think of primary key and unique constraints as part of the logical data model (they are there to define required relationships in the data) and would only be added or removed if the logical data model changes.  But unique indexes, like other indexes, are part of only the physical data model and they are there only for performance and they can be removed without changing the underlying logical data model.

    Tom

    Friday, January 18, 2013 12:33 AM

All replies

  • Hello Kalman,

    Good question: What is the qualitative difference between a primary key and a unique index? ... I am going to follow this thread ...


    Olaf Helper

    Blog Xing


    Thursday, January 17, 2013 7:57 PM
  • Hi

    Many moons ago, i was taught the primary key served as a unique identifier for the records, hence no nulls allowed.

    Ideally it should be a field that 'we' have control and that allows us to distinguish each record.

    Thursday, January 17, 2013 8:48 PM
  • A table needs a key.

    You need a key to uniquely identify a row.

    The Primary Key constraint is a convention to indicate that the declared column (or list of columns) constitutes a key.

    There can be other reasons why a combination of column may be unique or is declared to be unique. For example because of one of the columns in the combination is (already) unique. By convention, you should only declare the column(s) as Primary Key when it constitutes a key.

    The convention about Unique constraints is not very strong. They are used in any situation where a column (or list of columns) needs to be unique from a modelling point of view, regardless of the reason. It could be a (candidate) key, but doesn't have to be.

    Similarly for unique indexes. For example, a compound index that includes the primary key can be declared as unique without any danger, because the primary key part will already guarantee uniqueness. But obviously a compound index that includes the primary key shouldn't be considered a key, since a key should not have any nonessential information.


    Gert-Jan

    Thursday, January 17, 2013 9:07 PM
  • You don't "need" a Primary Key.  A "Primary Key" is a special ANSI UNIQUE KEY which does not allow NULLs.  If you don't have a use for one, then you don't "need" one.

    However, you should have a unique row identifier of some kind in every table. 

    Thursday, January 17, 2013 9:56 PM
  • The answer depends if the table(s) use natural key or surrogate keys...

    When using a natural key, a PK is important so that:

    1) duplicate records are not created

    2) read performance is increased - one can organize the physical storage of the data so that the most likely needed records are first

    When using a surrogate key, the PK is important so that:

    1) duplicate keys are not created (Duplicate records are eliminated using unique indexes)

    2) join performance increases when using small-bandwidth data types (INT vs VARCHAR) as keys

    Thursday, January 17, 2013 10:13 PM
  • AFAIK, the only implementation differences are:

    1)  Primary key columns must be not null.  If you define the primary key constraint as part of the CREATE TABLE statement, the columns default to not null.  If you define the primary key constraint sometime after the table is created and any of the columns are nullable, you will get an error.  Unique constraints can have nullable columns in the key.

    2) Primary key constraints default to clustered unless the table already has a clustered index.  Unique constraints default to nonclustered.

    3) Foreign key constraints can reference either a primary key or a unique constraint.  But in order for a foreign key constraints to reference a unique constraint, the columns in the parent table unique constraint must be used in the foreign key constraint definition.

    4) The type and type_desc columns in the sys.objects table (and other metadata views) have different values for primary key and unique constraints.

    5) If you post a table definition in this forum and the table definition has only unique constraints and no primary key constraint, Joe Celko will write a post pointing out the error of your ways <grin>.

    In actual practice, if I have multiple candidate keys (like, for example a surrogate key and a natural key), I choose the one that will be used for foreign key constraints (like the surrogate key) as the primary key and the other(s) get unique constraints.

    Of course, you can also declare unique indexes. They are implemented much like unique constraints except they do not get an entry in sys.objects.  Also, unique indexes can have included columns.  Unique constraints cannot have included columns.

    In actual practice, I make candidate keys that I didn't choose for the primary key into unique constraints.  A rare exception to that rule is if I want included columns with that candidate key.  If I want an index on a set of columns that I know must be unique, but are not a candidate key (because it includes all the columns in a candidate key plus some other columns), I make that a unique index, not a unique constraint.  Essentially, I think of primary key and unique constraints as part of the logical data model (they are there to define required relationships in the data) and would only be added or removed if the logical data model changes.  But unique indexes, like other indexes, are part of only the physical data model and they are there only for performance and they can be removed without changing the underlying logical data model.

    Tom

    Friday, January 18, 2013 12:33 AM
  • Here is my take.

    You need PRIMARY KEY in a database so that you can connect tables (with FOREIGN KEY constraints).

    CREATE TABLE Alpha 
      ( 
         id      INT PRIMARY KEY, 
         msg     VARCHAR(50)  UNIQUE, 
         moddate DATETIME DEFAULT Getdate() 
      ); 
    
    CREATE TABLE Beta 
      ( 
         id      INT PRIMARY KEY, 
         alphaid INT REFERENCES Alpha, 
         descr   VARCHAR(50) UNIQUE, 
         moddate DATETIME DEFAULT Getdate() 
      );  


    Kalman Toth SQL 2008 GRAND SLAM
    Paperback: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012


    Friday, January 18, 2013 9:07 PM
  • You only need PRIMARY KEY's to add constraint validation overhead to your Inserts/Updates/Deletes against the table(s) that reference the PK.

    You can manually control the business logic (non-nulls/default values/etc...) without Primary Keys.  Standard Unique indexes serve the other purpose of PKs.

    PK's are really not necessary for all Database Table needs.  Look at single row tables... No need for a PK there. 

    Look at Microsoft Dynamics SL.  None of their tables use "proper" FK relationships.  All the FK logic is handled on the application side.


    • Edited by carbonc Saturday, January 19, 2013 10:11 PM
    Saturday, January 19, 2013 10:08 PM
  • I, too, have recently came up with this question of why we need primary keys in a table that already has a clustered index better suited to query performance.  If I'm never joining on a static, sequential, small, unique primary key column because it has no foreign key relationship to another table, what's the point of including the field or the primary key itself?  Ralph Kimball says my data warehouse fact tables should not have a primary key of a sequential, small, static, unique column at all, but instead the rows should be inherently unique by virtue of having differing foreign key values back to DimCustomers, DimDate, DimSalesperson, etc.

    Has somebody provided the answer here while I'm just failing to understand it?

    Thanks,
    Eric

    Thursday, February 28, 2013 12:39 AM
  • Ralph Kimball says my data warehouse fact tables should not have a primary key of a sequential, small, static, unique column at all, but instead the rows should be inherently unique by virtue of having differing foreign key values back to DimCustomers, DimDate, DimSalesperson, etc.


    What Kimball says that instead of a PRIMARY KEY, UNIQUE KEY or unique index enforcing the uniqueness of the set of FK columns in a fact table, the software which populates the fact table should enforce it. The tacit assumption is "perfect" software with no bug.

    The point is that it is simple for the DBA to see if the table has a unique constraint. It is likely very difficult if not impossible for a DBA to see if the populating software works correctly or it has bugs as frequently the case with application software.

    So basically PRIMARY KEY, UNIQUE KEY / index constraint on a fact table is just peace of mind for the responsible DBA. And his boss.

    If for performance reasons unique index is not desirable on the fact table, a uniqueness test can be performed after the fact table is populated (workaround).

    >what's the point of including the field or the primary key itself? 

    Even if there is no need for uniqueness or connection point (FK referencing PK), some software requires PRIMARY KEY on a table.

    Windows Azure SQL Database requires clustered index on a table. The PRIMARY KEY constraint creates a unique clustered index by default.


    Kalman Toth Database & OLAP Architect
    Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012







    Thursday, February 28, 2013 1:39 AM
  • Kimball says that because it's true if you're following his DW modeling method. 

    A DW fact table with only an IDENTITY key is behaves much like a heap, and in DW you rarely need to identify a single fact row.  There's no real harm in slapping on a synthetic key to a fact table, but neither is it actually useful.  It doesn't enforce anything meaningful and it's not helpful for queries.

    Fact rows are, by definition, keyed by the combination of the dimension foreign key columns.  You actually should declare a unique constraint or PK for these columns, but that will make your loads more expensive, and your queries less expensive.  If you can make that work in your DW, then do it.  Leaving off PK's and FK's that belong there in Data Warehouse is a hack to improve performance.  IMO both have been made obsolete by improvements in server hardware.

    David


    David http://blogs.msdn.com/b/dbrowne/








    Thursday, February 28, 2013 2:11 AM
  • I like what Rui and Tom wrote.  I've been using Access since 1999, and SQL Server since 2009, and I don't think I have ever used a PK.  I learned what a PK is right around the time I first started using databases, but I never 'needed' a PK.  I can certainly understand the critical need for a PK, but almost all of my work over time has been importing/exporting financial data to/from databases and spreadsheets.  Sometimes I'd happen to have a unique identifier (CUSIP) in some of the Tables, but the CUSIP wasn't used much, and even if it was, I was never doing cascade updates and cascade deletes on the data.  Nevertheless, I can see the obvious need for such a thing if you are dealing with customer data, in which case a unique identifier is essential!!!
    Tuesday, March 19, 2013 12:20 PM