locked
create new table with deduped data. Carry along a uniqueidentifer or create a new one in new deduped table? RRS feed

  • Question

    • I have a table with duplicates.
    • I want to get all unique combinations of two column, StandardNbr and Type and include the uniqueIdentifier column if possible and create a new table after the dedup process.
    • I can't include the uniqueidentifier obviously in the select distinct. Should I select into a new table and then recreate the UniqueIdentifier key ??  I need to eliminate nulls in either one or both of the columns. 
    • I tried to create a new table called Test2 using Test's create script but it said there's already an object named 'PK_Test' in the database. What's the best way to circumvent that so that I have an equivalent table schema to copy the deduped data into? 
    • Here's the query that successfully eliminates the duplicates and the table's schema that contains the duplicates. 

     Thank you!

    Select Distinct StandardNbr,[Type]  from [Test]

    CREATE TABLE [dbo].[Test](
    	[TestUid] [uniqueidentifier] NOT NULL,
    	[Type] [nvarchar](50) NULL,
    	[StandardNbr] [nvarchar](50) NULL,
    	[StandardName] [nvarchar](500) NULL,
     CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED 
    (
    	[TestUid] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    



     

     

     

     

    Sunday, October 16, 2011 10:36 PM

Answers

  • I can propose two option2:
    1. use group by and pick id from any of the rows (using min or max)
    Select
    MIN([TestUid]) AS OrgUid
    ,StandardNbr
    ,[Type]
    from [Test]
    group by
    StandardNbr
    ,[Type]
    
    2. Use select distinct and load data into new table with identity column
     

    • Edited by Piotr Palka Monday, October 17, 2011 12:44 AM
    • Marked as answer by hazz Monday, October 17, 2011 1:30 PM
    Monday, October 17, 2011 12:41 AM
  • As for error: "I tried to create a new table called Test2 using Test's create script but it said there's already an object named 'PK_Test' in the database. What's the best way to circumvent that so that I have an equivalent table schema to copy the deduped data into?"

    When you creating copy of the table under new name you have to rename primary key too: CONSTRAINT [PK_Test_copy] PRIMARY KEY CLUSTERED

    • Marked as answer by hazz Monday, October 17, 2011 12:57 AM
    Monday, October 17, 2011 12:43 AM

All replies

  • I can propose two option2:
    1. use group by and pick id from any of the rows (using min or max)
    Select
    MIN([TestUid]) AS OrgUid
    ,StandardNbr
    ,[Type]
    from [Test]
    group by
    StandardNbr
    ,[Type]
    
    2. Use select distinct and load data into new table with identity column
     

    • Edited by Piotr Palka Monday, October 17, 2011 12:44 AM
    • Marked as answer by hazz Monday, October 17, 2011 1:30 PM
    Monday, October 17, 2011 12:41 AM
  • As for error: "I tried to create a new table called Test2 using Test's create script but it said there's already an object named 'PK_Test' in the database. What's the best way to circumvent that so that I have an equivalent table schema to copy the deduped data into?"

    When you creating copy of the table under new name you have to rename primary key too: CONSTRAINT [PK_Test_copy] PRIMARY KEY CLUSTERED

    • Marked as answer by hazz Monday, October 17, 2011 12:57 AM
    Monday, October 17, 2011 12:43 AM
  • Thank you Piotr. I have table Test2 now after renaming Primary key which I can rename back to original after copying data without duplicates.  So now how do I create the query to use the correct select distinct with just the two columns, but to also include all other columns in the select statement? Thank you.

     Select Distinct StandardNbr,[Type] and all other columns in table  from [SemsTestEMC]
    

     

     



    • Edited by hazz Monday, October 17, 2011 1:09 AM
    Monday, October 17, 2011 1:03 AM
  • EVERYTHING you posted is FUNDAMENTALLY WRONG! Not a little wrong, but totally, horrible wrong. When you are fired, good programmers will be cleaving the mess for a month or two. I read it slowly from top to bottomand here is my response :

     

    >> I have a table with duplicates.<<

     

    Table cannot have duplicate rows by definition. This is not a table; at best, it is deck of punch cars written in SQL. You are so ignorant that you will use IDENTITY, GUID and/orUNIQUEIDENTIFIER in your code.

     

    >> I want to get all unique combinations of two columns, StandardNbr and Type and include the UNIQUEIDENTIFIER column if possible and create a new table after the dedup process.<<

     

    WOW! I was right! A magical generic “type” in total violation of ISO-11179 rules and common sense. Hey, let's assume it is blood_type! ISOI uses underscores and not the old CamelCase stuff

     

    The use of the UNIQUEIDENTIFIER to destroy any hope of a valid Relational Model.

     

    >> I can't include the UNIQUEIDENTIFIER obviously in the SELECT DISTINCT. Should I select into a new table and then recreate the UNIQUEIDENTIFIER key? I need to eliminate NULLs in either one or both of the columns. <<

     

    A new table? You mean a another deck of fake punch cards in SQL! Again, everything you are doing is wrong; not a little wrong, but FUNDAMENTALLY wrong.

     

    Without scroll down any further, i expect to see the usual crap of NVARCHAR(50) and more null-able columns in one table than in any schema I weer saw at a Fortune 500 company?

    >> I tried to create a new table called Test2 using Test's create script but it said there's already an object named 'PK_Test' in the database. <<

    Without any help form you, i will guess that the constraint is in use, since all constraint names are global (do you know why?)_

    >> What's the best way to circumvent that so that I have an equivalent table schema to copy the de-duped data into? <<

    The right answer is not to ever write such a &*&@ mess in the first place. If this is medical information ave the personal and professional integrity to tell your boss that you are over your head and that your work needs to be re-done NOW. 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
    Monday, October 17, 2011 1:26 AM
  • the data came from an Excel spreadsheet. It does have duplicate data. The select distinct will take care of it.
    Monday, October 17, 2011 1:36 AM
  • For columns outside the key it is tricky, because orginal rows may have not consistient values.

    You need to examine data which are beeing droped.

    I think that simples is to join new table with orginal table using apply and use SELECT TOP 1 after cross apply

    SELECT * from new table n

    outer apply (SELECT TOP 1 * from dirty_table d wher n.key=d.key)



    • Proposed as answer by Naomi N Monday, October 17, 2011 1:59 AM
    • Edited by Piotr Palka Monday, October 17, 2011 1:59 AM
    Monday, October 17, 2011 1:58 AM
  • You gave me the correct answer Piotr, I just didn't put it all together last night.  I reposted the question differently after more work and got the following working correctly.

    SELECT
         NEWID()as newUid,
         StandardNbr
        ,[Type]
        , Min([StandardName]) AS StandardName
        , Min([Scope]) AS Scope
        , Min([Class]) AS Class
        , Min([Conditions]) AS Conditions
        , Min(Requirements) AS Requirements
        , Min(PerformanceCriteria) AS PerformanceCriteria
        , Min(Cost) AS Cost
        , Min([Status]) AS Status
        , Min(Created) AS Created
        , Min(Modified) AS Modified
        , Min(ModifiedBy) AS ModifiedBy
    INTO EMC3    
    FROM EMC
    GROUP BY
    StandardNbr
    ,[Type]
    
    

     

    Monday, October 17, 2011 1:34 PM