none
How to delete specific rows after finding duplicates by name RRS feed

  • Question

  • Hello,

    I use this query to find duplicates in a product table by name.

    select 
    Id, Sku, ManufacturerPartNumber, CreatedOnUtc, Deleted,
    pp.CNT,
    pp.[Name]
    from product p
    join ( SELECT [Name], COUNT([Name]) as CNT FROM Product GROUP BY [Name] HAVING COUNT([Name]) > 1 ) pp on pp.[Name]=p.[Name]
    

    I obtained that query by asking for help here: Duplicates

    It yields rows similar to this:
    As you can see, there are Deleted = 0 and Deleted = 1 rows.

    Can this query be modified to delete the rows for the duplicates that have a "1" in the Deleted column?

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Tuesday, December 12, 2017 8:10 PM

All replies

  • Question is do you want to apply deleted=1 filter while finding duplicates Name column values or on final output?

    select 
    Id, Sku, ManufacturerPartNumber, CreatedOnUtc, Deleted,
    pp.CNT,
    pp.[Name]
    from product p
    join ( SELECT [Name], COUNT([Name]) as CNT FROM Product
    where Deleted = 1
    GROUP BY [Name] HAVING COUNT([Name]) > 1 ) pp on pp.[Name]=p.[Name]
    
    OR 
    
    select 
    Id, Sku, ManufacturerPartNumber, CreatedOnUtc, Deleted,
    pp.CNT,
    pp.[Name]
    from product p
    join ( SELECT [Name], COUNT([Name]) as CNT FROM Product
    GROUP BY [Name] HAVING COUNT([Name]) > 1 ) pp on pp.[Name]=p.[Name]
    where Deleted = 1


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    • Proposed as answer by Xi Jin Wednesday, December 13, 2017 2:37 AM
    Tuesday, December 12, 2017 8:16 PM
  • Sarat,

    Thanks for your help.

    The result that I want is for the rows found that have a Deleted = "1", should be deleted from the table.  So, if I run this query again, it should find zero rows because there will be no more duplicates.

    I hope that explains it.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Tuesday, December 12, 2017 8:27 PM
  • If all the duplicates have a "1", can't you just delete where deleted = 1?

    Otherwise something along the lines of 

    with cte as 
    (
    select name, deleted, row_number() over (partition by name order by deleted) as rn
    from product
    )
    delete from cte where rn > 1 and deleted = 1
    will delete only those where it is actually a duplicated AND deleted = 1


    Tuesday, December 12, 2017 9:01 PM
  • >> I use this query to find duplicates in a product table by name. <<

    Where is the DDL for this table? Don't you know that we don't write with bit flags in SQL? That was the old days of assembly language and punch card programming. We don't put metadata, such as an audit trail shown by a creation date, so that when it is deleted, this information is destroyed. 

    In a proper data model. There is no such absurdities as a magical, universal, generic "id" or likewise a magical, generic, universal "name" in RDBMS. These are what ISO and metadata standards committee call attribute properties. They have to be the identifier of something in particular in the name of something in particular.

    Your singular table name tells us you have only one product; is that correct?

    We now have to wonder exactly what the key of your products table is? Is it the vague generic something "id", the manufacturers part number, or the SKU? Can different products have the same vague generic name? 

    Want to try again and follow forum Netiquette? 

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Wednesday, December 13, 2017 12:27 AM
  • Hi Tony,

    Let's see your sample data. For example, when name = '(20-Pack)...' there are 4 records and all of them have a Deleted = '1'. So based on your logic, you need to delete all of them. Right?

    And when name = '25-Pack...', there are two rows. And one have Deleted = '1' and one is 0. So you want to delete the single row which Deleted = '1'. Right?

    If so, I think Sarat's solution should work for you. Have you tried it?

    If not, could you please clarify your requirement based on your sample data? And share us your expected result.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, December 13, 2017 2:37 AM
  • what according to you represents a "duplicate"?

    Your posted data are all unique based on the fact that they represent different parts within same product


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Wednesday, December 13, 2017 8:36 AM
  • Hi Tony,

    Let's see your sample data. For example, when name = '(20-Pack)...' there are 4 records and all of them have a Deleted = '1'. So based on your logic, you need to delete all of them. Right?

    And when name = '25-Pack...', there are two rows. And one have Deleted = '1' and one is 0. So you want to delete the single row which Deleted = '1'. Right?

    If so, I think Sarat's solution should work for you. Have you tried it?

    If not, could you please clarify your requirement based on your sample data? And share us your expected result.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Xi,

    Thanks for your help.

    The answer to your first question is Yes.  Delete the "1"s even if there is no matching "0".

    The answer to your second question is Yes.  Delete the row that has Deleted = "1".

    No, I did not try Sarat's solution.  I could be wrong, but I do not see anything there that deletes rows.  Will it actually delete the unwanted rows?

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Wednesday, December 13, 2017 1:05 PM
  • Visakh16,

    Thanks for your help.

    what according to you represents a "duplicate"?

    A duplicate is when two or more rows have the same Name.

    Your posted data are all unique based on the fact that they represent different parts within same product

    I'm unsure of what you are saying here.  Besides duplicates on Name, those are all different products in the Product table.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Wednesday, December 13, 2017 1:15 PM
  • Celko,

    Thanks for your help.

    Where is the DDL for this table?

    What is a DDL?

    Don't you know that we don't write with bit flags in SQL?

    Please explain this.  What do you use in SQL?

    Your singular table name tells us you have only one product; is that correct?


    Not correct.  There are over 7,000 products in the Product table.

    We now have to wonder exactly what the key of your products table is?

    It is the vague generic something "id".

    Can different products have the same vague generic name? 

    Yes.


    Want to try again and follow forum Netiquette? 

    Where can I get the information for the proper forum Netiquette?

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.


    • Edited by Carneno Wednesday, December 13, 2017 1:32 PM
    Wednesday, December 13, 2017 1:32 PM
  • Hi Tony,

    =>Will it actually delete the unwanted rows?

    No.

    Sarat's solution is used to return the records which satisfy your deleted logic. It is just a select statement. And will not delete any records.

    So you can try the query. Check if the query returns all the wanted result. If the result is right. Then the Delete or not is determined on you.

    Delete query is like this:

    delete p
    from product p
    join ( SELECT [Name], COUNT([Name]) as CNT FROM Product
    GROUP BY [Name] HAVING COUNT([Name]) > 1 ) pp on pp.[Name]=p.[Name]
    where Deleted = 1

    You should always verify the data first before delete.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, December 14, 2017 1:59 AM
  • Xi,

    Would I add your delete query to the end of what I already have or replace what I have?

    Thanks for your help.

    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Thursday, December 14, 2017 1:05 PM
  • You mean this I guess

    --DELETE t
    SELECT * FROM ( select Id, Sku, ManufacturerPartNumber, CreatedOnUtc, Deleted, [Name], ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY CreatedOnUtc DESC) AS Seq from product WHERE Deleted = 1 )t WHERE Seq > 1

    First run SELECT, make sure it returns you duplicate records to be deleted. Then remove SELECT and uncomment the DELETE and execute it


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Thursday, December 14, 2017 1:25 PM
  • Visakh16,

    Thanks for your help.

    I'm sorry, when I run my original query, I get 5,030 rows, but when I run your Select query, I only get 533 rows.

    I'm not sure your query is working for me.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Thursday, December 14, 2017 1:57 PM
  • Visakh16,

    Thanks for your help.

    I'm sorry, when I run my original query, I get 5,030 rows, but when I run your Select query, I only get 533 rows.

    I'm not sure your query is working for me.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    My query will list only those rows which have to be deleted

    i.e duplicate instances alone

    Your original query lists all rows including duplicate entries


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Thursday, December 14, 2017 2:28 PM
  • Visakh16,

    I'm sorry, but there are rows in my query that I can see should be deleted that are not in the rows from your query.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Thursday, December 14, 2017 6:14 PM
  • If all the duplicates have a "1", can't you just delete where deleted = 1?

    Otherwise something along the lines of 

    with cte as 
    (
    select name, deleted, row_number() over (partition by name order by deleted) as rn
    from product
    )
    delete from cte where rn > 1 and deleted = 1
    will delete only those where it is actually a duplicated AND deleted = 1


    So none of this worked?

    If you want to delete rows where deleted = 1 whether there is a deleted = 0 or not, just delete from <table> where deleted = 1

    If only where it is genuinely a duplicate and "marked", what's wrong with above?

    Thursday, December 14, 2017 7:50 PM
  • Ryan,

    I'm sorry.  I did not mean to ignore your suggestion.

    Would I put your query at the end of my query or do I run it by itself?

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Thursday, December 14, 2017 8:29 PM
  • By your query, you mean the query to set delete = 1?

    It can only be run after the "deleted = 1" as been set but if you wanted to bypass that and just delete the duplicates

    with cte as 
    (
    select name, deleted, row_number() over (partition by name order by (select 1)) as rn
    from product
    )
    delete from cte where rn > 1 
    Will delete all but one for each name - if you want to be more specific as to which is deleted you can change the "(select 1)" to columns that determine which is to be saved (the row ordered first for each name will be kept)

    Thursday, December 14, 2017 8:47 PM
  • Hi Tony,

    If you have ran SS's select query and verify that the result is what you want. Then you can run my delete query and delete these duplicated rows.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, December 15, 2017 1:57 AM
  • Ryan,

    By your query, you mean the query to set delete = 1? 

    No, I mean the query in my opening post here.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Friday, December 15, 2017 1:23 PM
  • Hi Tony,

    After all, you did not try any of the solutions. Right?

    If you are afraid that our query will delete the data which you don't want to delete. You can select part of your data into a temp table first. Then try those solutions on this temp table.

    We don't know your actual situation,  you must try these solutions yourself. Then you'll know the effect of them and know when and where to run them.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, December 18, 2017 8:01 AM
  • Hi Tony,

    After all, you did not try any of the solutions. Right?

    If you are afraid that our query will delete the data which you don't want to delete. You can select part of your data into a temp table first. Then try those solutions on this temp table.

    We don't know your actual situation,  you must try these solutions yourself. Then you'll know the effect of them and know when and where to run them.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Xi,

    I'm sorry I did not get back to you sooner.

    I need more time to figure out what I want to do.  I'm still unsure about how I want the query to work and what it is I need to delete.

    I appreciate all of your time and everybody elses time and effort to help me resolve this issue.

    I will be back.

    Thanks,
    Tony


    Stop The World, I want To Get Off! ........... Life Isn't About Waiting For The Storm To Pass ... It's About Learning To Dance In The Rain.

    Tuesday, December 19, 2017 9:01 PM