none
find the duplicate records RRS feed

  • Question

  •  I bcp a flat file to a staging table where I do the Transformation of the data. The problem I'm having is that it
    is loading the approved record along with a duplicated(based on default load values) to the Prod table. If any records
    match the Key, but are not in an approved status I want to display(verify) then delete the bad data. In my example the 2nd
    record is what should load to Prod and the 1st should be displayed then deleted. Is there a display query to see how many 
    records meet this criteria(bad data), before the delete phase.

    Thanks

    CREATE TABLE [dbo].[#tmporders](
    	[vehid] [int] NOT NULL,
    	[ref] [nvarchar](25) NOT NULL,
    	[masterbol] [nvarchar](25) NOT NULL,
    	[bol] [nvarchar](25) NOT NULL,
    	[vendr] [nvarchar](25) NOT NULL,
    	[ordernbr] [nvarchar](25) NOT NULL,
    	[custline] [int] NOT NULL,
    	[pronbr] [nvarchar](25) NOT NULL,
    	[invoicenbr] [nvarchar](25)NOT NULL,
    	[invoicestat] [nvarchar](8)
    	)
    	
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','n/a','n/a')
    	go
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    	go
    	
    	Select * from
    	#tmporders

    Wednesday, September 18, 2019 12:29 AM

Answers

  • Try this:

     
    
    CREATE TABLE [dbo].[#tmporders](
    	[vehid] [int] NOT NULL,
    	[ref] [nvarchar](25) NOT NULL,
    	[masterbol] [nvarchar](25) NOT NULL,
    	[bol] [nvarchar](25) NOT NULL,
    	[vendr] [nvarchar](25) NOT NULL,
    	[ordernbr] [nvarchar](25) NOT NULL,
    	[custline] [int] NOT NULL,
    	[pronbr] [nvarchar](25) NOT NULL,
    	[invoicenbr] [nvarchar](25)NOT NULL,
    	[invoicestat] [nvarchar](8)
    	)
    	
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','n/a','n/a')
    	go
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    	go
    	;with mycte  as (
    	Select * from #tmporders t
    	Where exists (select 1   from #tmporders t2 
    	where invoicestat='approved'
    	and t.[vehid] =t2.[vehid] and t.[ref] =t2.[ref]) 
    	and (invoicestat='n/a' or invoicestat is null  )
    	)
    --Select * from mycte   --check
    Delete from mycte  --delete
    
     select * from #tmporders  
    
     drop TABLE [dbo].[#tmporders]

    • Marked as answer by hart60 Friday, September 20, 2019 11:43 PM
    Thursday, September 19, 2019 1:31 AM
    Moderator

All replies

  • Hi hart60,

    Check it out.

    First approved row per vehid is legit. Everything else per vehid is candidate for deletion. 

    Also, the rn column plays a role of a counter.

    -- DDL and sample data population, start
    DECLARE @tmporders TABLE (
    	[vehid] [int] NOT NULL,
    	[ref] [nvarchar](25) NOT NULL,
    	[masterbol] [nvarchar](25) NOT NULL,
    	[bol] [nvarchar](25) NOT NULL,
    	[vendr] [nvarchar](25) NOT NULL,
    	[ordernbr] [nvarchar](25) NOT NULL,
    	[custline] [int] NOT NULL,
    	[pronbr] [nvarchar](25) NOT NULL,
    	[invoicenbr] [nvarchar](25)NOT NULL,
    	[invoicestat] [nvarchar](8)
    	);
    	
    INSERT INTO @tmporders
    VALUES ('791404444','999','brn99','bol99','jlws','899a','1','p765','n/a','n/a')
    	, ('791404444','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    	, ('791777777','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    	, ('791777777','999','brn99','bol99','jlws','899a','1','p765','ord744','n/a')
    	, ('791777777','999','brn99','bol99','jlws','899a','1','p765','ord744','n/a');
    -- DDL and sample data population, end
    
    WITH rs AS 
    (
       SELECT *, 
          ROW_NUMBER() OVER(PARTITION BY [vehid] ORDER BY [invoicestat] ASC) AS rn
       FROM @tmporders
    )
    SELECT * FROM rs -- all rows
    --WHERE rn = 1; -- approved rows
    --WHERE rn > 1; -- TO DELETE WHEN ready

    Wednesday, September 18, 2019 12:44 AM
  •  I don't have to partition by entire key?

    Wednesday, September 18, 2019 1:00 AM
  • Hi hart60,

    Yes, you would need to PARTITION BY by the entire key, i.e. all columns that signify the unique identifier.


    Wednesday, September 18, 2019 2:21 AM
  • Hi hart60,

     

    Per your description, you would like to determine if there is a row which have the same verify . If it has , you would like to delete bad data (invoicenbr<>'approved' and exists)and then inserted data . How would you like to deal with the data (invoicenbr='approved' and exists)? Keep the metadata and not insert ?

     

    I think Trigger might help you.

     invoicenbr<>'approved' and exists--->delete and insert

     invoicenbr='approved' and exists  --->not delete and not insert

     not exists    --->insert

     

    Please try following script.

     
    -- DDL and sample data population, start
    create table  tmporders  (
    	[vehid] [int] NOT NULL,
    	[ref] [nvarchar](25) NOT NULL,
    	[masterbol] [nvarchar](25) NOT NULL,
    	[bol] [nvarchar](25) NOT NULL,
    	[vendr] [nvarchar](25) NOT NULL,
    	[ordernbr] [nvarchar](25) NOT NULL,
    	[custline] [int] NOT NULL,
    	[pronbr] [nvarchar](25) NOT NULL,
    	[invoicenbr] [nvarchar](25)NOT NULL,
    	[invoicestat] [nvarchar](8)
    	);
    
    go
    create TRIGGER delect_insert 
    ON tmporders
    INSTEAD OF insert 
    AS
    BEGIN
    IF  exists (SELECT 1 FROM tmporders c join inserted i on c.vehid=i.vehid where c.invoicestat<>'approved') 
    begin 
    DELETE s FROM tmporders s JOIN inserted  i ON i.vehid = s.vehid 
    insert into tmporders select * from inserted 
    end 
    if not exists (SELECT 1 FROM tmporders c join inserted d on c.vehid=d.vehid ) 
    insert into tmporders select * from inserted 
    
    END 
    GO
    Insert Into tmporders
    Values('791404444','999','brn99','bol99','jlws','899a','1','p765','n/a','n/a')
    go
    Insert Into tmporders
    Values('791404444','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    go
    Insert Into tmporders
    Values('79140111','999','brn99','bol99','jlws','899a','1','p765','ord755','approved')
    go
    Insert Into tmporders
    Values('79140111','999','brn99','bol99','jlws','899a','1','p765','ord755','n/a')
    select * from tmporders
    /*
    vehid       ref                       masterbol                 bol                       vendr                     ordernbr                  custline    pronbr                    invoicenbr                invoicestat
    ----------- ------------------------- ------------------------- ------------------------- ------------------------- ------------------------- ----------- ------------------------- ------------------------- -----------
    79140111    999                       brn99                     bol99                     jlws                      899a                      1           p765                      ord755                    approved
    791404444   999                       brn99                     bol99                     jlws                      899a                      1           p765                      ord744                    approved
    */

    Hope it will help you.

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, September 18, 2019 2:56 AM
  • Please read this blog

    http://dimantdatabasesolutions.blogspot.com/2007/02/dealing-with-duplicates.html


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Wednesday, September 18, 2019 4:29 AM
    Answerer
  • I think what you should do is this

    First do select of all rows like this based on your matching conditions

    SELECT columns,
    CASE WHEN Seq > 1 THEN 'To be deleted' ELSE 'retained' END AS Status FROM ( SELECT columns, ROW_NUMBER() OVER (PARTITION BY vehid ORDER BY CASE WHEN invoicestat = 'approved' THEN 1 ELSE 2 END) AS Seq FROM YourTable WHERE conditions.... )t

    then check for the status column

    All rows with value as to be deleted are the ones which will get deleted and others will be retained

    once happy with the result, then do the population/actual delete

    INSERT prod
    SELECT columns
    FROM 
    (
    SELECT columns,
    ROW_NUMBER() OVER (PARTITION BY vehid ORDER BY CASE WHEN invoicestat = 'approved' THEN 1 ELSE 2 END) AS Seq
    FROM YourTable
    WHERE conditions....
    )t
    WHERE Seq = 1
    
    DELETE t
    FROM
    (
    SELECT columns,
    ROW_NUMBER() OVER (PARTITION BY vehid ORDER BY CASE WHEN invoicestat = 'approved' THEN 1 ELSE 2 END) AS Seq
    FROM YourTable
    WHERE conditions....
    )t
    WHERE Seq > 1


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Wednesday, September 18, 2019 5:22 AM
  •  I'll try your solution and report back thanks for the Updates.

    Wednesday, September 18, 2019 11:57 PM
  • I'm only looking for records that got duplicated with an Approved record. If I get an approved record
    for  the KEY, and also receive a duplicated record but the status is Null and has the default
    value of 'N/A' in the Invoicenbr field, then this is a record I need to discover and the delete. If not
    it will add to the Prod table, and causes issue in processing down stream, as now I get 2 records during
    processing.

     Thanks .. hope I explained better.
     

    Thursday, September 19, 2019 12:04 AM
  • Try this:

     
    
    CREATE TABLE [dbo].[#tmporders](
    	[vehid] [int] NOT NULL,
    	[ref] [nvarchar](25) NOT NULL,
    	[masterbol] [nvarchar](25) NOT NULL,
    	[bol] [nvarchar](25) NOT NULL,
    	[vendr] [nvarchar](25) NOT NULL,
    	[ordernbr] [nvarchar](25) NOT NULL,
    	[custline] [int] NOT NULL,
    	[pronbr] [nvarchar](25) NOT NULL,
    	[invoicenbr] [nvarchar](25)NOT NULL,
    	[invoicestat] [nvarchar](8)
    	)
    	
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','n/a','n/a')
    	go
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    	go
    	;with mycte  as (
    	Select * from #tmporders t
    	Where exists (select 1   from #tmporders t2 
    	where invoicestat='approved'
    	and t.[vehid] =t2.[vehid] and t.[ref] =t2.[ref]) 
    	and (invoicestat='n/a' or invoicestat is null  )
    	)
    --Select * from mycte   --check
    Delete from mycte  --delete
    
     select * from #tmporders  
    
     drop TABLE [dbo].[#tmporders]

    • Marked as answer by hart60 Friday, September 20, 2019 11:43 PM
    Thursday, September 19, 2019 1:31 AM
    Moderator
  •  I think this is close, but what if you have line 2 on that same order?

    Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','2','p765','n/a','n/a')
    	go


     It's not associated with an approved record(so not duplicated). This record needs carried forward in the process
    and not deleted. I think I just need to add all the key in the where statement.

     Thanks.

    Thursday, September 19, 2019 9:53 AM
  • Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','2','p765','n/a','n/a')
    	go

    Did you mean that you would like to insert this row in final result?

    CREATE TABLE [dbo].[#tmporders](
    	[vehid] [int] NOT NULL,
    	[ref] [nvarchar](25) NOT NULL,
    	[masterbol] [nvarchar](25) NOT NULL,
    	[bol] [nvarchar](25) NOT NULL,
    	[vendr] [nvarchar](25) NOT NULL,
    	[ordernbr] [nvarchar](25) NOT NULL,
    	[custline] [int] NOT NULL,
    	[pronbr] [nvarchar](25) NOT NULL,
    	[invoicenbr] [nvarchar](25)NOT NULL,
    	[invoicestat] [nvarchar](8)
    	)
    	
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','n/a','n/a')
    	go
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','1','p765','ord744','approved')
    	go
    	Insert Into #tmporders
    	Values('791404444','999','brn99','bol99','jlws','899a','2','p765','n/a','n/a')
    	go
    	;with mycte  as (
    	Select * from #tmporders t
    	Where exists (select 1   from #tmporders t2 
    	where invoicestat='approved'
    	and t.[vehid] =t2.[vehid] and t.[ref] =t2.[ref] and t.custline=t2.custline) 
    	and (invoicestat='n/a' or invoicestat is null  )
    	)
    --Select * from mycte   --check
    Delete from mycte  --delete
    
     select * from #tmporders  
    
     drop TABLE [dbo].[#tmporders]

    Best Regards,

    Rachel 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, September 20, 2019 7:56 AM
  •  Many Thanks for ALL replies

    Friday, September 20, 2019 11:44 PM