locked
sql query for duplicated records RRS feed

  • Question

  • Hi,

    I have an Orders entity and have below columns

    orderguid (32 digit guid)
    orderid   (varchar)
    tempApprid(varchar)
    permaAppid(varchar)
    ordername (varchar)
    orderdcity(varchar)
    orderdcode(varchar)
    orderdmailid(varchar)
    ------- many more


    now the problem is i have duplicated order entries.
    The duplicates are based on one column that is the orderguid.
     
    for all the duplicates the orderid,tempApprid and the permapprid are same but have different other
    values like few orders have different city and few have differeint names and
    so on for the same duplicated order.


    i could write the query to get duplicate rows using EXCEPT but how do i get which column value is different.

     

    SELECT orderid,tempApprid,permaAppid,ordername,orderdcity,orderdcode,orderdmailid
    FROM Orders WHERE orderid = @varorderid
    EXCEPT
    SELECT orderid,tempApprid,permaAppid,ordername,orderdcity,orderdcode,orderdmailid
    FROM Orders WHERE orderid = @orderguid

    the first SELECT gives the duplicated orderid for some 3 orders and the final result
    would be the orders that have same orderid but have different any of the values in
    SELECT attribute list.

    Please help me in finding which columns have different value or atleast those orders that have
    all same values except the orderguid.

    TIA

    Ramya

     

     

     

    Saturday, September 24, 2011 1:19 PM

Answers

  • Based on the dataset that you have listed you have sets sets of exact duplicates.

    Create Table #Test(
    	orderguid uniqueidentifier,
    	orderid varchar(30),
    	tempApprid varchar(30),
    	permaAppid varchar(30),
    	ordername varchar(30), 
    	orderdcity varchar(30),
    	orderdcode int
    )
    
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1233-3AEA-1069-A2DD-08002B30309D','O123456','T123456','G123456','MAC cons','surat',123456)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1234-3AEA-1070-A2DD-08002B30309D','O123456','T123456','G123456','MAC cons','hyderbad',897908)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1235-3AEA-1071-A2DD-08002B30309D','O123456','T123456','G123456','MAC cons','surat',123456)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1236-3AEA-1072-A2DD-08002B30309D','O412345','T412345','G412345','Tata CS','mumbai',678900)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1237-3AEA-1073-A2DD-08002B30309D','O412345','T412345','G412345','Tata CS','mumbai',678900)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1238-3AEA-1074-A2DD-08002B30309D','O987689','T987689','G987689','icontrol','panama',543210)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1239-3AEA-1075-A2DD-08002B30309D','O987689','T987689','G987689','icontrol','liverpool',543210)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1240-3AEA-1076-A2DD-08002B30309D','O987689','T987689','G987689','icontrol','chicago',678543)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1241-3AEA-1077-A2DD-08002B30309D','O126798','T126798','G126798','panasonic','goa',9812000)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1242-3AEA-1078-A2DD-08002B30309D','O126798','T126798','G126798','panasonic','arsenal',6785430)
    
    
    -- Second Query response
    Select *
    From
    	(Select ROW_NUMBER() Over (Partition By orderid ,tempApprid ,permaAppid ,ordername ,orderdcity ,orderdcode
    								Order By orderid ,tempApprid ,permaAppid ,ordername ,orderdcity ,orderdcode) RowNumber,
    			orderguid ,orderid ,tempApprid ,permaAppid ,ordername ,orderdcity ,orderdcode
    		From #Test
    	) R1
    Where RowNumber > 1
    /*
    RowNumber	orderguid								orderid	tempApprid	permaAppid	ordername	orderdcity	orderdcode
    2			21EC1233-3AEA-1069-A2DD-08002B30309D	O123456	T123456		G123456		MAC cons	surat		123456
    2			21EC1237-3AEA-1073-A2DD-08002B30309D	O412345	T412345		G412345		Tata CS		mumbai		678900
    */
    
    Drop Table #Test
    

    The results from the query give you the rows of data which are exact duplicates.


    Warwick Rudd
    MCT MCITP SQL Server 2008 Admin
    My Blog
    • Proposed as answer by _AKS Saturday, September 24, 2011 11:53 PM
    • Marked as answer by Kalman Toth Thursday, September 29, 2011 5:41 AM
    Saturday, September 24, 2011 11:09 PM

All replies

  • Your requirements are somwhat unclear to me.  Are you saying that you have multiple rows with the same orderguid or is it that orderguid is unique but the orderid the same?  You also mention that orderid,tempApprid and the permapprid are same but I'm not sure if that's relevant to the problem at hand.

    Can you post some sample data and expected results?  I think that will help clarify.

     


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Saturday, September 24, 2011 2:12 PM
  • Hi !

    Can you please post DDL and sample data and desired output.

    Thanks, Hasham

     

     

    Saturday, September 24, 2011 4:02 PM
    Answerer
  • hey dan,

    PFB the sample data

     

    orderguid    orderid   tempApprid    permaAppid   ordername   orderdcity  orderdcode
    21EC1233-3AEA-1069-A2DD-08002B30309D O123456 T123456 G123456 MAC cons surat 123456
    21EC1234-3AEA-1070-A2DD-08002B30309D O123456 T123456 G123456 MAC cons hyderbad 897908
    21EC1235-3AEA-1071-A2DD-08002B30309D O123456 T123456 G123456 MAC cons surat 123456
    21EC1236-3AEA-1072-A2DD-08002B30309D O412345 T412345 G412345 Tata CS mumbai 678900
    21EC1237-3AEA-1073-A2DD-08002B30309D O412345 T412345 G412345 Tata CS mumbai 678900
    21EC1238-3AEA-1074-A2DD-08002B30309D O987689 T987689 G987689 icontrol panama 543210
    21EC1239-3AEA-1075-A2DD-08002B30309D O987689 T987689 G987689 icontrol liverpool 543210
    21EC1240-3AEA-1076-A2DD-08002B30309D O987689 T987689 G987689 icontrol chicago 678543
    21EC1241-3AEA-1077-A2DD-08002B30309D O126798 T126798 G126798 panasonic goa 9812000
    21EC1242-3AEA-1078-A2DD-08002B30309D O126798 T126798 G126798 panasonic arsenal 6785430

     

    the orderguid is definetly different for all the records....but the combination of orderid,tempapprid,permapprid are the same ..  the except query that is written gives out the different rows but what i need is what colums are differed..

     The order name "MAC cons" has 3 duplicates but two rows have diff city and one row has different ordercode..i want both of the colums to be retrived.

    the second query needed is to get the distinct rows where all the colums except the orderguid are same for the duplicates  

    that is this query shd retrive the order with "TATA CS".

     

    Saturday, September 24, 2011 5:12 PM
  • Thanks for the data.  I don't see orderdmailid. Do you need that column checked for differences too?  Can you please show the tabular results you need based on this sample data?

     


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Saturday, September 24, 2011 6:48 PM
  • Based on the dataset that you have listed you have sets sets of exact duplicates.

    Create Table #Test(
    	orderguid uniqueidentifier,
    	orderid varchar(30),
    	tempApprid varchar(30),
    	permaAppid varchar(30),
    	ordername varchar(30), 
    	orderdcity varchar(30),
    	orderdcode int
    )
    
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1233-3AEA-1069-A2DD-08002B30309D','O123456','T123456','G123456','MAC cons','surat',123456)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1234-3AEA-1070-A2DD-08002B30309D','O123456','T123456','G123456','MAC cons','hyderbad',897908)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1235-3AEA-1071-A2DD-08002B30309D','O123456','T123456','G123456','MAC cons','surat',123456)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1236-3AEA-1072-A2DD-08002B30309D','O412345','T412345','G412345','Tata CS','mumbai',678900)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1237-3AEA-1073-A2DD-08002B30309D','O412345','T412345','G412345','Tata CS','mumbai',678900)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1238-3AEA-1074-A2DD-08002B30309D','O987689','T987689','G987689','icontrol','panama',543210)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1239-3AEA-1075-A2DD-08002B30309D','O987689','T987689','G987689','icontrol','liverpool',543210)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1240-3AEA-1076-A2DD-08002B30309D','O987689','T987689','G987689','icontrol','chicago',678543)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1241-3AEA-1077-A2DD-08002B30309D','O126798','T126798','G126798','panasonic','goa',9812000)
    Insert into #Test(orderguid,orderid ,tempApprid , permaAppid ,ordername , orderdcity ,orderdcode)
    Values ('21EC1242-3AEA-1078-A2DD-08002B30309D','O126798','T126798','G126798','panasonic','arsenal',6785430)
    
    
    -- Second Query response
    Select *
    From
    	(Select ROW_NUMBER() Over (Partition By orderid ,tempApprid ,permaAppid ,ordername ,orderdcity ,orderdcode
    								Order By orderid ,tempApprid ,permaAppid ,ordername ,orderdcity ,orderdcode) RowNumber,
    			orderguid ,orderid ,tempApprid ,permaAppid ,ordername ,orderdcity ,orderdcode
    		From #Test
    	) R1
    Where RowNumber > 1
    /*
    RowNumber	orderguid								orderid	tempApprid	permaAppid	ordername	orderdcity	orderdcode
    2			21EC1233-3AEA-1069-A2DD-08002B30309D	O123456	T123456		G123456		MAC cons	surat		123456
    2			21EC1237-3AEA-1073-A2DD-08002B30309D	O412345	T412345		G412345		Tata CS		mumbai		678900
    */
    
    Drop Table #Test
    

    The results from the query give you the rows of data which are exact duplicates.


    Warwick Rudd
    MCT MCITP SQL Server 2008 Admin
    My Blog
    • Proposed as answer by _AKS Saturday, September 24, 2011 11:53 PM
    • Marked as answer by Kalman Toth Thursday, September 29, 2011 5:41 AM
    Saturday, September 24, 2011 11:09 PM
  • The following article deals with deduping a table with duplicates:

    http://www.sqlusa.com/bestpractices/eliminateduplicates/

    In the future you can use unique index / constraint on the columns (composite) to keep them unique.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Thursday, September 29, 2011 5:45 AM