locked
Primary key RRS feed

  • Question

  • I'm creating table with "SELECT col1, col2, col3,... INTO dbo.NewTable FROM ..." statement to achieve minimal logging.
    The problem is that first 2 columns must be primary key (or I can create new identity column for that purpose).

    But I need PK. How can I do that inside select statement? If I do afterwards, it will take time for sorting.


    • Edited by simonxy Wednesday, February 24, 2016 5:33 PM correction
    Wednesday, February 24, 2016 5:33 PM

Answers

  • ROWNUMBER will allow you to define what the numbering will be base on. But, that column will never be an identity column in the result table. IDENTITY() function doesn't allow you do control what decide the ordering, but it will be cheaper (compared to order over something "real") and the resulting column will (obviously) be an identity column.

    As for you using a UNION, you can have the UNION inside a CTE or derived table and use IDENTITY() when you select form the CTE or derived table.


    Tibor Karaszi, SQL Server MVP (Web Blog)


    Wednesday, February 24, 2016 6:36 PM
  • Hi simonxy,

    UNION ALL cannot guarantee uniqueness(first 2 columns must be primary key) for col1 and col2. If you have fully considered in advance, then you may use script below as mentioned by TiborK.

    ;with cte as
    (
    	select col1, col2, col3 from table1
    	UNION ALL
    	select col1, col2, col3 from table2
    	--...
    )
    SELECT IDENTITY(int, 1, 1) AS col0, col1, col2, col3
    INTO dbo.newTable
    FROM cte

    Sam Zha
    TechNet Community Support

    Thursday, February 25, 2016 3:48 AM

All replies

  • There is no way with INTO command to specify a PK. You can create incrementing ID in the statement, but you will have to alter the structure afterwords to make a PK.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Wednesday, February 24, 2016 5:43 PM
  • In fact I have like this:

    select col1, col2, col3 INTO dbo.newTable FROM ...
    UNION ALL
    select col1, col2, col3 FROM ...

    I can add col0 as identity(1,1) into select statement, but it probably wont work because of UNION ALL statement?

    If i add col0 after select into it takes a lot of time:

    ALTER TABLE dbo.newTable ADD col0 INT IDENTITY (1, 1);
    ALTER TABLE dbo.newTable ADD PRIMARY KEY (col0);

    What would be the fastest way to add some column as primary key in my case?

    Wednesday, February 24, 2016 5:47 PM
  • Just use

    select *, ROW_NUMBER() OVER (order by someColumn) as ID

    FROM (select ...

    UNION ALL

    ...) T

    And then just make that column to be a PK.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Wednesday, February 24, 2016 5:51 PM
  • To add a bit to Naomi's comments: SELECT INTO does not carry over any constraint (key, default, check etc), indexes etc.

    It only carries over the columns named and data types (obviously) and also if you have an identity column. If you do not have an ideneity column in your source query, you can generate one using the IDENTITY() function. bot the identity, if you have such) will not be made primary key. the primary key definition (be it over identity or not) will have to be made using ALTER TABLE ... ADD CONSTRAINT.


    Tibor Karaszi, SQL Server MVP (Web Blog)

    Wednesday, February 24, 2016 6:02 PM
  • " ROW_NUMBER() OVER (order by someColumn) "

    This will produce SORT of inner table, which is very costly operation.

    Isn't better to use :

    "rowId=IDENTITY (int, 1, 1)" which doesn't sort base table?

    Wednesday, February 24, 2016 6:25 PM
  • Sounds like it may be a good idea. I wanted to suggest it as well, but I always forget the syntax.

    You also don't have to use any column in the order by, it can be order by (select 1) for example, so just a random order.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Wednesday, February 24, 2016 6:36 PM
  • ROWNUMBER will allow you to define what the numbering will be base on. But, that column will never be an identity column in the result table. IDENTITY() function doesn't allow you do control what decide the ordering, but it will be cheaper (compared to order over something "real") and the resulting column will (obviously) be an identity column.

    As for you using a UNION, you can have the UNION inside a CTE or derived table and use IDENTITY() when you select form the CTE or derived table.


    Tibor Karaszi, SQL Server MVP (Web Blog)


    Wednesday, February 24, 2016 6:36 PM
  • Can you create the table first and use Insert into like this sample:

     Create table dbo.newTable(col0 INT IDENTITY (1, 1) primary key, col1 int,col2 int,col3 int) 
    
    
     insert into dbo.newTable (col1,col2,col3) 
     Select 1 as col1,2 as col2,3 as col3
     union all
     Select 1 as col1,2 as col2,3 as col3
    
    
    
     Select * from dbo.newTable
    
    
     drop table dbo.newTable

    Wednesday, February 24, 2016 6:46 PM
  • You cannot use INTO table for what you are trying to do.  You need to create the table first, add the PK and IDENITY, then insert the rows.

    You can try something like this:

    select col1, col2, col3 INTO dbo.newTable FROM ... WHERE 0=1
    
    ALTER TABLE dbo.newTable ADD col0 INT IDENTITY (1, 1);
    ALTER TABLE dbo.newTable ADD PRIMARY KEY (col0);
    
    INSERT INTO dbo.newTable 
    select col1, col2, col3 FROM ...
    UNION ALL
    select col1, col2, col3 FROM ...
    

    Wednesday, February 24, 2016 7:10 PM
  • >> I'm creating table with "SELECT col1, col2, col3,... INTO dbo.NewTable FROM ..." statement to achieve minimal logging. <<

    Why do not you put this into a view? The view will always be correct and never need to be updated. Your mindset seems to be locked back in the days of punch cards and magnetic tapes. There is no need to materialize (write out to physical media) your data. If your design is correct, the view will automatically have a valid key. If your design is wrong, then you are going to fail matter what you do.

    >> The problem is that first 2 columns must be primary key (or I can create new IDENTITY column [sic] for that purpose).<<

    This makes no sense. A key is a subset of the attributes of the entity model in the table which is unique. The primary key is one of the candidate keys that has been given special status in SQL. Dr. Codd later apologized for creating "the false concept of "primary key" when he realized that all keys are equally "key–like" and none are special.

    The reason he made this error is that even he was a victim of the file system mindset. Magnetic tapes and punch cards depend on being sorted on one or more fields in a record; think about trying to do random access on a magnetic tape with 1,000,000 records. AARRGH!!

    The old Sybase code Museum in SQL server was based on the primitive file system on UNIX machines. Back in those days, the system created a record number two locate the data. This was kept at the operating system level, but was exposed to the programmer. This is the identity of T-SQL! The identity column is not a column; it is a table property. It is a count of physical insertion attempts (not even successes) to one file on one machine, which represents one table. It is totally non-relational and has nothing to do with either a key or a logical data model.

    You do not create a primary key; you discover it in the data itself, if you have done a correct data model. This vague narrative, without any DDL, is a classic newbie mistake. You are still using magnetic tapes in the year 2016, but you are using SQL to write it. 





    But I need PK. How can I do that inside select statement? If I do afterwards, it will take time for sorting.

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Wednesday, February 24, 2016 8:39 PM
  • Create table table with CREATE TABLE, define the primary key. Either make it non-clustered and the insert will be minimally logged. Or make it clustered; in this case you may need to enable trace flag 610.

    Don't add an identity column, if there already is a key in the data itself. That is just pointless.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Proposed as answer by Naomi N Thursday, February 25, 2016 2:49 AM
    Wednesday, February 24, 2016 11:07 PM
  • Hi simonxy,

    UNION ALL cannot guarantee uniqueness(first 2 columns must be primary key) for col1 and col2. If you have fully considered in advance, then you may use script below as mentioned by TiborK.

    ;with cte as
    (
    	select col1, col2, col3 from table1
    	UNION ALL
    	select col1, col2, col3 from table2
    	--...
    )
    SELECT IDENTITY(int, 1, 1) AS col0, col1, col2, col3
    INTO dbo.newTable
    FROM cte

    Sam Zha
    TechNet Community Support

    Thursday, February 25, 2016 3:48 AM