locked
Opinion to create a efficient primary key. RRS feed

  • Question

  • Hello,

    Primary key in a table SQL is more efficient in integer or in varchar ?

    Thanks for you opinion.

    Monday, October 20, 2014 11:50 AM

Answers

  • I vote for Integer.

    http://sqlmag.com/database-administration/sql-design-how-choose-primary-key

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Monday, October 20, 2014 11:56 AM
  • Integer comparison is always faster than character comparisons.  Integer is a better choice.

    Monday, October 20, 2014 12:05 PM
    Answerer
  • SURROGATE PRIMARY KEY INT (4-bytes) is the best choice.



    Kalman Toth Database & OLAP Architect SQL Server 2014 Database Design
    New Book / Kindle: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2014




    Monday, October 20, 2014 12:06 PM
  • A good article on Primary Key design, here.

    I would always use Integer.

    Regards,


    Kind regards, Kev

    Monday, October 20, 2014 12:13 PM
  • Hi,

    Normally when you select your primary key (happens to be clustered index most of the times), the key should fulfill below properties.

    1. Unique

    2. Narrow

    3. Static 

    4. Ever Increasing

    If you consider all these properties, I recommend Integer data type. Preferably Integer with identity property set on it. 

    Refer Paul Randal's below article on same subject.

    Ever-increasing clustering key – the Clustered Index Debate……….again!


    Please mark solved if I've answered your question, vote for it as helpful to help other users find a solution quicker
    Praveen Dsa | MCITP - Database Administrator 2008 | My Blog | My Page


    Monday, October 20, 2014 12:22 PM
  • Please see the show an eg. why Integer are the best choice for PK

    CREATE TABLE dbo.IntegrPK  
    (Id INT PRIMARY KEY CLUSTERED IDENTITY NOT NULL,
    Val VARCHAR(100) NOT NULL
    );
    
    CREATE TABLE dbo.varcharPK (
    	Id VARCHAR(8) PRIMARY KEY CLUSTERED,
    	Val VARCHAR(100) NOT NULL
    );
    
    INSERT INTO IntegrPK(Val)
    SELECT replicate('0',80) 
    GO 100000
    
    
    
    
    
    DECLARE @id INT=0
    WHILE @id<100000
    
    BEGIN
    INSERT INTO varcharPK (ID,Val)
    SELECT 	@id,REPLICATE('0', 80)
    set @id=@id+1
    END
    	
    	
    
    
    ALTER table varcharPK ADD countrycode VARCHAR(5)
    ALTER TABLE IntegrPK ADD countrycode VARCHAR(5)
    
    UPDATE varcharPK set countrycode='USA' WHERE Id BETWEEN 1 and 5000
    UPDATE IntegrPK SET countrycode = 'USA' WHERE Id BETWEEN 1 AND 5000
    
    
    UPDATE varcharPK SET countrycode = 'UK' WHERE Id BETWEEN 5000 AND 15000
    UPDATE IntegrPK SET countrycode = 'UK' WHERE Id BETWEEN 5000 AND 15000
    
    CREATE INDEX ix_nl_ccode ON varcharPK(countrycode)
    CREATE INDEX ix_nl_ccode ON IntegrPK (countrycode)

    Let see how much space space both tables used

    SELECT
    	page_count,
    	avg_record_size_in_bytes,
    	avg_page_space_used_in_percent,
    	index_id
    FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('IntegrPK'), DEFAULT, NULL, 'DETAILED');
    
    
    SELECT
    	page_count,
    	avg_record_size_in_bytes,
    	avg_page_space_used_in_percent,
    	index_id
    FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('varcharPK'), DEFAULT, NULL, 'DETAILED');

    Index id=1 clustered index and 2 = non clustered. As you can see, if PK is on varchar column SQL consume more data pages to store data and it also impact the size of the NL index.

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Monday, October 20, 2014 12:28 PM
  • Primary keys are sets of columns that uniquely defines each row in your table.

    Surrogate keys should be avoided in relational data models. Your primary concern should be defining a key that uniquely defines each entry in your table, regardless of data type.

    Choosing the right data type for your columns is a whole other story. It should be accounted for while defining the primary key, but you should never choose a surrogate key over a business key just because the column(s) for the business key would not be of the integer data type.

    Things you should bear in mind:

    - The primary key column(s) should be chosen for columns which have meaning for the business model, and that you know are unique. Social security numbers, for example, are unique. Leave surrogate keys for the dimensional model, and use them only when you cannot safely assign a column that is unique.

    - A clustered index will be created by default over the columns you define for the primary key. This will physically order the data pages based on these columns, speeding up queries that use them as predicates. If you know these columns won't be used as predicates very often, you are doing it wrong.

    - It is best to use integer, datetime, bit or uniqueidentifier data types for primary keys for reasons already explained above, but this doesn't mean you should add a new identity column if you already have a varchar column that uniquely identifies the entries. Chances are you wont even use this extra column as a predicate, and your clustered index won't be used, forcing you to add additional indexes that have maintenance costs.

    Monday, October 20, 2014 12:51 PM
  • VARCHAR or other string data types are the bad choice. 

    A primary key needs at least three conditions:

    1. It should be the entity identity 

    2. It's length must be as shorter as possible

    3. It's values do not change ( or rarely change )


    T-SQL Articles
    T-SQL e-book by TechNet Wiki Community
    T-SQL blog

    Monday, October 20, 2014 12:53 PM

All replies

  • I vote for Integer.

    http://sqlmag.com/database-administration/sql-design-how-choose-primary-key

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Monday, October 20, 2014 11:56 AM
  • Integer comparison is always faster than character comparisons.  Integer is a better choice.

    Monday, October 20, 2014 12:05 PM
    Answerer
  • SURROGATE PRIMARY KEY INT (4-bytes) is the best choice.



    Kalman Toth Database & OLAP Architect SQL Server 2014 Database Design
    New Book / Kindle: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2014




    Monday, October 20, 2014 12:06 PM
  • A good article on Primary Key design, here.

    I would always use Integer.

    Regards,


    Kind regards, Kev

    Monday, October 20, 2014 12:13 PM
  • Hi,

    Normally when you select your primary key (happens to be clustered index most of the times), the key should fulfill below properties.

    1. Unique

    2. Narrow

    3. Static 

    4. Ever Increasing

    If you consider all these properties, I recommend Integer data type. Preferably Integer with identity property set on it. 

    Refer Paul Randal's below article on same subject.

    Ever-increasing clustering key – the Clustered Index Debate……….again!


    Please mark solved if I've answered your question, vote for it as helpful to help other users find a solution quicker
    Praveen Dsa | MCITP - Database Administrator 2008 | My Blog | My Page


    Monday, October 20, 2014 12:22 PM
  • Please see the show an eg. why Integer are the best choice for PK

    CREATE TABLE dbo.IntegrPK  
    (Id INT PRIMARY KEY CLUSTERED IDENTITY NOT NULL,
    Val VARCHAR(100) NOT NULL
    );
    
    CREATE TABLE dbo.varcharPK (
    	Id VARCHAR(8) PRIMARY KEY CLUSTERED,
    	Val VARCHAR(100) NOT NULL
    );
    
    INSERT INTO IntegrPK(Val)
    SELECT replicate('0',80) 
    GO 100000
    
    
    
    
    
    DECLARE @id INT=0
    WHILE @id<100000
    
    BEGIN
    INSERT INTO varcharPK (ID,Val)
    SELECT 	@id,REPLICATE('0', 80)
    set @id=@id+1
    END
    	
    	
    
    
    ALTER table varcharPK ADD countrycode VARCHAR(5)
    ALTER TABLE IntegrPK ADD countrycode VARCHAR(5)
    
    UPDATE varcharPK set countrycode='USA' WHERE Id BETWEEN 1 and 5000
    UPDATE IntegrPK SET countrycode = 'USA' WHERE Id BETWEEN 1 AND 5000
    
    
    UPDATE varcharPK SET countrycode = 'UK' WHERE Id BETWEEN 5000 AND 15000
    UPDATE IntegrPK SET countrycode = 'UK' WHERE Id BETWEEN 5000 AND 15000
    
    CREATE INDEX ix_nl_ccode ON varcharPK(countrycode)
    CREATE INDEX ix_nl_ccode ON IntegrPK (countrycode)

    Let see how much space space both tables used

    SELECT
    	page_count,
    	avg_record_size_in_bytes,
    	avg_page_space_used_in_percent,
    	index_id
    FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('IntegrPK'), DEFAULT, NULL, 'DETAILED');
    
    
    SELECT
    	page_count,
    	avg_record_size_in_bytes,
    	avg_page_space_used_in_percent,
    	index_id
    FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('varcharPK'), DEFAULT, NULL, 'DETAILED');

    Index id=1 clustered index and 2 = non clustered. As you can see, if PK is on varchar column SQL consume more data pages to store data and it also impact the size of the NL index.

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Monday, October 20, 2014 12:28 PM
  • Primary keys are sets of columns that uniquely defines each row in your table.

    Surrogate keys should be avoided in relational data models. Your primary concern should be defining a key that uniquely defines each entry in your table, regardless of data type.

    Choosing the right data type for your columns is a whole other story. It should be accounted for while defining the primary key, but you should never choose a surrogate key over a business key just because the column(s) for the business key would not be of the integer data type.

    Things you should bear in mind:

    - The primary key column(s) should be chosen for columns which have meaning for the business model, and that you know are unique. Social security numbers, for example, are unique. Leave surrogate keys for the dimensional model, and use them only when you cannot safely assign a column that is unique.

    - A clustered index will be created by default over the columns you define for the primary key. This will physically order the data pages based on these columns, speeding up queries that use them as predicates. If you know these columns won't be used as predicates very often, you are doing it wrong.

    - It is best to use integer, datetime, bit or uniqueidentifier data types for primary keys for reasons already explained above, but this doesn't mean you should add a new identity column if you already have a varchar column that uniquely identifies the entries. Chances are you wont even use this extra column as a predicate, and your clustered index won't be used, forcing you to add additional indexes that have maintenance costs.

    Monday, October 20, 2014 12:51 PM
  • VARCHAR or other string data types are the bad choice. 

    A primary key needs at least three conditions:

    1. It should be the entity identity 

    2. It's length must be as shorter as possible

    3. It's values do not change ( or rarely change )


    T-SQL Articles
    T-SQL e-book by TechNet Wiki Community
    T-SQL blog

    Monday, October 20, 2014 12:53 PM