none
Removing duplicates in ms sql 2005

    Question

  • Hi,

    I am trying to remove duplicates from my fact table but unable to remove. I have an identity column. please advice how to do it.

    Thanks in advance.

    • Moved by Tom PhillipsModerator Wednesday, August 10, 2011 8:33 PM TSQL question (From:SQL Server Database Engine)
    Wednesday, August 10, 2011 8:24 PM

Answers

  • I think Deepesh is on the right track, but slightly off.  You want to Partition not on the ID, but on the columns where you want to detect duplicates.

    Sample data:

    ID, F1, F2, F3, F4

    1, a, b, c, d

    2, d, e, g, h

    3, x, y, u, z

    4, a, b, c, d    << duplicate of ID=1 by combination of F1,F2,F3,F4

     

    The query to delete the duplicates would be

    ;with TMP as (
    	select *, RN=Row_Number() over (
    		partition by F1, F2, F3, F4
    		order by ID)
    	from myTable)
    DELETE TMP
    WHERE RN > 1
    
    Note the PARTITION BY clause should contain the combination of columns where you want to detect the duplicate

    • Marked as answer by SamSSIS Thursday, August 18, 2011 3:31 PM
    Wednesday, August 10, 2011 9:42 PM

All replies

  • You can use a CTE (Common Table Expression) in SQL Server 2005 and up to achieve this - you basically "partition" your data by the ID, so each group of identical values is a partition, and then you number those sequentially using ROW_NUMBER(). Any values of ROW_NUMBER() larger than 1 indicate a duplicate.
     

    ;WITH Duplicates AS (   SELECT    DetailID, FKey, Col1, Col2, Col3, Col4,    ROW_NUMBER() OVER (PARTITION BY DetailID ORDER BY FKey) AS 'RowNum'   FROM dbo.YourTable ) SELECT  DetailID, FKey, Col1, Col2, Col3, Col4, RowNum FROM Duplicates WHERE RowNum > 1 
    

    This will produce a list of duplicates for you - you can now either update those, or delete them, or whatever you want to do with them.

     

    PLease mark it as answer if it resolves the issue.

    Wednesday, August 10, 2011 8:30 PM
  • Hi deepesh, thank you very much for your response,

    sorry to ask you,

    please let me know what is there before with duplicates i your query as i am very new to this.

    Thanks in advance.

    Wednesday, August 10, 2011 8:55 PM
  • Hi SamSSIS

     

    I have written a blog post and also posted the script in the galleries for this purposes. 

    Check the scripts 

    Blog --- http://everysolution.wordpress.com/2011/07/24/tsql-delete-duplicates-in-your-table/

    Microsoft gallery -- http://gallery.technet.microsoft.com/scriptcenter/2e147568-956d-43c4-8c2e-637f1b5ab6f4

     

    Surendra


    Nothing is Permanent... even Knowledge....
    My Blog
    Wednesday, August 10, 2011 8:57 PM
  • Hi deepesh, i need Small information.

    what would be this detail id in this case you explained above.

    are u referring to any column in my table or which should be generated from your query ?

    Please advice

    Wednesday, August 10, 2011 8:58 PM
  • In the partition by you need to list the columns you want to check for duplicates. Say, you want to de-dup based on LastName, FirstName, then you list these two columns in the partition by clause.
    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    • Proposed as answer by KEROBIN Thursday, August 11, 2011 5:45 AM
    Wednesday, August 10, 2011 9:09 PM
    Moderator
  • I think Deepesh is on the right track, but slightly off.  You want to Partition not on the ID, but on the columns where you want to detect duplicates.

    Sample data:

    ID, F1, F2, F3, F4

    1, a, b, c, d

    2, d, e, g, h

    3, x, y, u, z

    4, a, b, c, d    << duplicate of ID=1 by combination of F1,F2,F3,F4

     

    The query to delete the duplicates would be

    ;with TMP as (
    	select *, RN=Row_Number() over (
    		partition by F1, F2, F3, F4
    		order by ID)
    	from myTable)
    DELETE TMP
    WHERE RN > 1
    
    Note the PARTITION BY clause should contain the combination of columns where you want to detect the duplicate

    • Marked as answer by SamSSIS Thursday, August 18, 2011 3:31 PM
    Wednesday, August 10, 2011 9:42 PM