Answered Puzzle #1

  • Thursday, August 09, 2012 7:14 PM
     
      Has Code

    Tools:SQL Management Studio 2008

    Environment: SQL Server 2008 and Windows 7

    Problem: I am trying to list duplicate records in table called temp, for example,will smith has two set of records: 012345655, 274679300, 'Will Smith', 'A'  and 012345655, 274679300, 'Will Smith', 'B'. I want to remove one set of records from the dummy table (either one). However, for a reason, my code lists 012345655, 274679300, 'Will Smith', 'C' and 012345655, 958407896, 'Will Smith', 'B'. Both of these records have different pin numbers and has not been treated as set of records. The result is wrong. I wonder how I would list the desired results. I am curious how you may resolve this issue.

    Declare @Dummy Table ( ID  int, pn int, name NVARCHAR(30), apt char(1) )
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A')
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D')
    insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E')
    insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L')
    
    select * from @Dummy -- 8 rows
    
    ;with cteDups as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn
    	From	@Dummy
    )
    Select	*
    From	cteDups
    Where	rn = 1 order by name 

    The following are wrong results

    ID                       pn               name            apt    rn
    12345655    862493644    michael douglas     E    1
    12345655    864051416    michael douglas     H    1
    12345655    958407896    Will Smith              C    1
    12345655    274679300    Will Smith               A    1

    And the those are wrong too

    ID	pn	name	apt	cnt
    12345655	274679300	Will Smith	A	2
    12345655	274679300	Will Smith	B	2
    12345655	862493644	michael douglas	E	2
    12345655	862493644	michael douglas 	F	2
    12345655	864051416	michael douglas	H	2
    12345655	864051416	michael douglas	L	2
    12345655	958407896	Will Smith	C	2
    12345655	958407896	Will Smith	D	2

    Desired results Must be

    12345655	862493644	michael douglas	E	
    12345655	862493644	michael douglas F
    12345655	958407896	Will Smith	C	
    12345655	958407896	Will Smith	D









    • Edited by sandra V O Thursday, August 09, 2012 8:21 PM
    • Edited by sandra V O Thursday, August 09, 2012 8:30 PM
    • Edited by sandra V O Thursday, August 09, 2012 8:31 PM
    • Edited by sandra V O Tuesday, August 14, 2012 8:20 PM
    •  

All Replies

  • Thursday, August 09, 2012 7:32 PM
     
      Has Code

    E.g.

    declare @Dummy table ( ID  int, pn int, name NVARCHAR(255), apt CHAR(1) );
    
    insert into @dummy values (012345655, 274679300, 'Will Smith', 'A');
    insert into @dummy values (012345655, 274679300, 'Will Smith', 'B');
    insert into @dummy values (012345655, 958407896, 'Will Smith', 'C');
    insert into @dummy values (012345655, 958407896, 'Will Smith', 'D');
    insert into @dummy values (012345655, 862493644, 'michael douglas', 'E');
    insert into @dummy values (012345655, 862493644, 'michael douglas ', 'F');
    insert into @dummy values (012345655, 864051416, 'michael douglas', 'H');
    insert into @dummy values (012345655, 864051416, 'michael douglas', 'L');
    insert into @dummy values (012345655, 958407812, 'Will Smith', 'X');
    insert into @dummy values (012345655, 862493612, 'michael douglas', 'Y');
    
    select * from @dummy;
    
    with dubs as
    	(	select	*, 
    				count(*) over (partition by id, pn, name) as cnt
    		from	@Dummy
    	 )
    	select * 
    	from dubs 
    	where cnt > 1;
    Please post correct scripts :)

  • Thursday, August 09, 2012 7:38 PM
     
     Answered Has Code

    I think your problem is your partition by clause.  Because you are only partitioning by the ID, that is all that gets checked.  Also, based on the data/structures given, I got a bunch of errors.  Here is something a bit more generic, but it gets the duplicates:

    --create table @Dummy ( ID  int, pn int, name NVARCHAR, apt int )
    Declare @Dummy Table ( ID  int, pn int, name NVARCHAR(30), apt char(1) )
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A')
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D')
    insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E')
    insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L')
    
    select * from @Dummy -- 4 rows
    
    ;with cteDups as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn
    	From	@Dummy
    )
    Select	*
    From	cteDups
    Where	rn = 1

    • Marked As Answer by sandra V O Monday, August 13, 2012 8:41 AM
    •  
  • Thursday, August 09, 2012 8:23 PM
     
      Has Code

    Thanks for the prompt response.I did revise my question in the top

    But the results are not what am looking for

    I would like to see the results:  

    12345655	274679300	Will Smith	A	
    12345655	274679300	Will Smith	B	
    12345655	864051416	michael douglas	H	
    12345655	864051416	michael douglas	L	
    	
    
    OR 
    
    12345655	862493644	michael douglas	E	
    12345655	862493644	michael douglas F
    12345655	958407896	Will Smith	C	
    12345655	958407896	Will Smith	D	





    • Edited by sandra V O Thursday, August 09, 2012 8:24 PM
    • Edited by sandra V O Thursday, August 09, 2012 8:29 PM
    • Edited by sandra V O Thursday, August 09, 2012 8:29 PM
    • Edited by sandra V O Thursday, August 09, 2012 8:32 PM
    •  
  • Thursday, August 09, 2012 8:37 PM
     
     Answered Has Code

    Not sure why you want it that way?  But here you go:

    --create table @Dummy ( ID  int, pn int, name NVARCHAR, apt int )
    Declare @Dummy Table ( ID  int, pn int, name NVARCHAR(30), apt char(1) )
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A')
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D')
    insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E')
    insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L')
    
    select * from @Dummy -- 4 rows
    
    ;with cteDups as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn
    	From	@Dummy
    )
    ,cteGrp as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,rn
    			,ROW_NUMBER() OVER(Partition By Name Order By name) - rn grp
    	From	cteDups
    )
    Select	ID
    		,pn
    		,name
    		,apt
    From	cteGrp
    Where	grp = (Select Min(grp) From cteGrp)

    • Marked As Answer by sandra V O Thursday, August 09, 2012 9:10 PM
    • Unmarked As Answer by sandra V O Thursday, August 09, 2012 10:44 PM
    • Marked As Answer by sandra V O Monday, August 13, 2012 8:40 AM
    •  
  • Thursday, August 09, 2012 8:48 PM
     
      Has Code

    Excellent. Thanks for the prompt response. It works as I wished. Would you please explain what you have done to obtain these results. I dont understand the following code line

    grp = (Select Min(grp) From cteGrp)
    ROW_NUMBER() OVER(Partition By Name Order By name) - rn grp


    • Edited by sandra V O Thursday, August 09, 2012 8:59 PM
    •  
  • Thursday, August 09, 2012 9:25 PM
     
      Has Code

    The second ROW_NUMBER() clause gets a distinct group value, based on Name.  If you add the RN and GRP columns and remove the where clause from the final select you will see:

    ID          pn          name                           apt  rn                   grp
    ----------- ----------- ------------------------------ ---- -------------------- --------------------
    12345655    862493644   michael douglas                E    1                    0
    12345655    862493644   michael douglas                F    2                    0
    12345655    864051416   michael douglas                H    1                    2
    12345655    864051416   michael douglas                L    2                    2
    12345655    958407896   Will Smith                     C    1                    0
    12345655    958407896   Will Smith                     D    2                    0
    12345655    274679300   Will Smith                     A    1                    2
    12345655    274679300   Will Smith                     B    2                    2

    The actual value of that grp column doesn't matter much, just the fact that it pulls the different sets together in the manner you want.  To get that second ROW_NUMBER() clause I actually stole (and slightly modified) it from a common gaps and islands solution.  And then you only want one of the sets (in your case, it didn't matter) so the where clause will pull the 0 group.  You could just as easily switch it to max to get the 2 group.

  • Thursday, August 09, 2012 10:09 PM
     
      Has Code

    Very clever. Thanks. 

    However, when I listed the following SQL statement

    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'x')
    insert into @Dummy values (123456789, 351871971, 'Jacjsons5',    'Q')
    insert into @Dummy values (123456789, 113653162, 'Jaksons5',    'w')

    Your code does list Jacksons5 as duplicate record

    The following were the results

    ID	             pn	             name	apt
    12345655	274679300	Will Smith	A
    12345655	274679300	Will Smith	B
    12345655	958407896	Will Smith	C
    12345655	958407896	Will Smith	D
    12345655	862493644	michael douglas	E
    12345655	862493644	michael douglas F
    12345655	864051416	michael douglas	H
    12345655	864051416	michael douglas	L
    12345655	864051416	michael douglas	x
    123456789	351871971	Jacjsons5	Q
    123456789	113653162	Jaksons5	w
    
    
    
    ID	            pn	            name	apt
    123456789	351871971	Jacjsons5	Q
    123456789	113653162	Jaksons5	w
    12345655	862493644	michael douglas	E
    12345655	862493644	michael douglas F
    12345655	958407896	Will Smith	C
    12345655	958407896	Will Smith	D

    The results should be:


    12345655    864051416    michael douglas    x
    12345655    862493644    michael douglas    E
    12345655    862493644    michael douglas F
    12345655    958407896    Will Smith    C
    12345655    958407896    Will Smith    D

    because those are duplicate

    Please advise



    • Edited by sandra V O Thursday, August 09, 2012 10:50 PM
    • Edited by sandra V O Thursday, August 09, 2012 10:58 PM
    • Edited by sandra V O Thursday, August 09, 2012 10:58 PM
    •  
  • Friday, August 10, 2012 1:21 PM
     
      Has Code

    If you take a little bit from Steffan's solution and add a COUT(*) OVER clause, you can make sure you are getting only those records that have more than one.  However, for your specific resultset you have to switch to using a MAX on the grp column.  You always want to show the rows that have the most duplicates?

    ;with cteDups as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn
    	From	@Dummy
    )
    ,cteGrp as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,rn
    			,ROW_NUMBER() OVER(Partition By Name Order By name) - rn grp
    			,COUNT(*) OVER(Partition By ID, pn) cnt
    	From	cteDups
    )
    Select	ID
    		,pn
    		,name
    		,apt
    		,rn
    		,grp
    		,cnt
    From	cteGrp
    Where	grp = (Select Max(grp) From cteGrp)
    	And	cnt > 1


    Edit:

    Also, while I was doing this I thought about what happens if you don't have the same set of duplicates?  So if you only have 1 group of Will Smith duplicates but two Michael Douglas and 5 Jeremy Renner?  In that case, you would only see the Jeremy Renner group, since the max grp will be 5.  If you use this select statement, however, it fixes that, and you use the max grp based on the person:

    Select	ID
    		,pn
    		,name
    		,apt
    		,rn
    		,grp
    		,cnt
    From	cteGrp c1
    Where	grp = (Select Max(c2.grp) From cteGrp c2 Where c1.name = c2.name)
    	And	cnt > 1

    • Edited by dgjohnson Friday, August 10, 2012 1:23 PM
    •  
  • Friday, August 10, 2012 9:44 PM
     
      Has Code

    I would like to express my sincere gratitude for being patient and taking the time to help me out. It mean a lot. I really like the thought process behind the code.I spent many trying to understand it because I can modify it and adjust it if more case scenarios appeared.

    I implemented this code

    Select	ID
    		,pn
    		,name
    		,apt
    		,rn
    		,grp
    		,cnt
    From	cteGrp c1
    Where	grp = (Select Max(c2.grp) From cteGrp c2 Where c1.name = c2.name)
    	And	cnt > 1

    Then, I added a few records to test out the code

    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A')
    insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C')
    insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D')
    insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E')
    insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L')
    insert into @Dummy values (012345655, 864051416, 'michael douglas', 'x')
    insert into @Dummy values (123456789, 351871971, 'Jacjsons5',	'Q')
    insert into @Dummy values (123456789, 113653162, 'Jaksons5',	'w')
    insert into @Dummy values (123456789, 113653162, 'Jaksons5',	'M')



    Results of the code is

    123456789	113653162	Jaksons5	w	1	0	2
    123456789	113653162	Jaksons5	M	2	0	2
    12345655	864051416	michael douglas	H	1	2	3
    12345655	864051416	michael douglas	L	2	2	3
    12345655	864051416	michael douglas	x	3	2	3
    12345655	274679300	Will Smith	A	1	2	2
    12345655	274679300	Will Smith	B	2	2	2



    But the desired one should be

    123456789	351871971	Jacjsons5	      Q
    12345655	864051416	michael douglas	H	1	2	3
    12345655	864051416	michael douglas	L	2	2	3
    12345655	864051416	michael douglas	x	3	2	3
    12345655	274679300	Will Smith	A	1	2	2
    12345655	274679300	Will Smith	B	2	2	2


    Why? we have case scenarios

    Case 1 is resolved by T-SQL

    12345655    274679300    Will Smith    A
    12345655    274679300    Will Smith    B
    12345655    958407896    Will Smith    C
    12345655    958407896    Will Smith    D

    We should Delete

    12345655    958407896    Will Smith    C
    12345655    958407896    Will Smith    D

    or

    12345655    274679300    Will Smith    A
    12345655    274679300    Will Smith    B

    Case 2 is not resolved yet

    12345655    862493644    michael douglas    E
    12345655    862493644    michael douglas     F
    12345655    864051416    michael douglas    H
    12345655    864051416    michael douglas    L
    12345655    864051416    michael douglas    X

    We should Delete

    12345655    862493644    michael douglas    E
    12345655    862493644    michael douglas     F


    Case 3 is not resolved yet

    123456789    351871971    Jacjsons5    Q
    123456789    113653162    Jaksons5    w
    123456789    113653162    Jaksons5    M


    We should Delete

    123456789    351871971    Jacjsons5    Q

    Case 4

    700662481     113653162    Bruce Wells    y
    700662481     351871971    Bruce wells    z

    We should Delete

    Either
    700662481     351871971    Bruce wells    y

    OR

    700662481     113653162    Bruce Wells    z

    Case 5 is resolved

    94403738    997550985  Jackie Shan    v

    NO Action to Delete


    Case 6 is resolved by T-SQL

    55654321     173670000    Tommy    A
    55654321    173670000    Tommy    B

    NO Action to Delete

    I look forward to hearing from you.







    • Edited by sandra V O Friday, August 10, 2012 9:45 PM
    • Edited by sandra V O Friday, August 10, 2012 10:17 PM
    • Edited by sandra V O Friday, August 10, 2012 10:25 PM
    • Edited by sandra V O Friday, August 10, 2012 10:47 PM
    •  
  • Friday, August 10, 2012 9:50 PM
     
     

    I'm on my way out, but will revisit later.  In the meantime, what is the business rule here?  You want to keep the duplicates?  It seems to me that you would WANT to delete

    123456789    113653162    Jaksons5    w
    123456789    113653162    Jaksons5    M

    instead of

    123456789    351871971    Jacjsons5    Q

    And I am not sure why you want to remove the Bruce Wells at all, since the ID and pn are the same.  Is it based more on name, than ID and pn combination?  Such that it's okay if you have multiples with the same name, so long as the ID and pn also match?

    Edit: I see now that Bruce Wells entries share the distinct ID/pn combinations from the Jaksons5, which makes sense why you want to delete them, but I still do not understand why you want to delete the set of two Jaksons5 instead of just the single.
    • Edited by dgjohnson Friday, August 10, 2012 9:58 PM
    •  
  • Friday, August 10, 2012 10:15 PM
     
     

    Thanks for the prompt response

    The business rule by looking at the five case scenarios is 

    1. if the pin number is repeated more than one  and has the max number of repetition, keep these records and delete other. These cases appear in Case 2 and case 3

    a .Case two is not resolved yet

    12345655    862493644    michael douglas    E
    12345655    862493644    michael douglas     F
    12345655    864051416    michael douglas    H
    12345655    864051416    michael douglas    L
    12345655    864051416    michael douglas    X

    We should Delete

    12345655    862493644    michael douglas    E
    12345655    862493644    michael douglas     F

    b. Case three is not resolved yet

    123456789    351871971    Jacjsons5    Q
    123456789    113653162    Jaksons5    w
    123456789    113653162    Jaksons5    M

    We should Delete

    123456789    351871971    Jacjsons5    Q

    2. if pin number is repeated twice keep one set and delete the other set.

    Case one is resolved by T-SQL

    12345655    274679300    Will Smith    A
    12345655    274679300    Will Smith    B
    12345655    958407896    Will Smith    C
    12345655    958407896    Will Smith    D

    We should Delete

    12345655    958407896    Will Smith    C
    12345655    958407896    Will Smith    D

    or

    12345655    274679300    Will Smith    A
    12345655    274679300    Will Smith    B

    3. If the pin is repeated twice , keep it - no action

    Case 6 is resolved by T-SQL

    55654321     173670000    Tommy    A
    55654321    173670000    Tommy    B

    NO ACTION to Delete

    4.if the person has same ID but different pin , delete one of the pin. Other wise, if person has unique ID and unique pin, no action

    Case 4

    700662481    113653162    Bruce Wells    y
    700662481     351871971    Bruce wells    z

    We should Delete

    Either
    700662481     351871971    Bruce wells    y

    OR

    700662481  113653162    Bruce Wells    z

    Case 5 is resolved

    94403738    997550985  Jackie Shan    v

    NO Action to Delete 

    * Kindly, see bold font in the pin number to convey my message

    I hope this is clear


    • Edited by sandra V O Friday, August 10, 2012 10:20 PM
    •  
  • Saturday, August 11, 2012 4:04 AM
     
      Has Code

    Okay, this should take care of everything but Case 4.  I don't have time to look at that right now, but will try to pick it up again tomorrow.

    ;with cteDups as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn
    	From	@Dummy
    )
    ,cteGrp as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,rn
    			,ROW_NUMBER() OVER(Partition By Name, ID Order By name) - rn grp
    			,COUNT(*) OVER(Partition By ID, pn) cnt1
    			,COUNT(*) OVER(Partition By ID) cnt2
    	From	cteDups
    )
    ,cteCase1 as
    	(
    	Select	ID
    			,pn
    			,name
    			,apt
    			,rn
    			,grp
    			,cnt1
    			,cnt2
    	From	cteGrp c1
    	Where	grp = (Select Min(c2.grp) From cteGrp c2 Where c1.name = c2.name)
    		--And	cnt > 1
    )
    ,cteCase2 as
    (
    	Select	ID
    			,pn
    			,name
    			,apt
    			,rn
    			,grp
    			,cnt1
    			,cnt2
    	From	cteGrp c1
    	Where	cnt1 = 1 And cnt2 > 1
    )
    Select	ID
    		,pn
    		,name
    		,apt
    From	cteCase1 c1
    Where Not Exists (Select Null From cteCase2 c2 Where c1.Name = c2.Name)
    UNION ALL
    Select	ID
    		,pn
    		,name
    		,apt
    From	cteCase2

  • Saturday, August 11, 2012 5:40 AM
     
      Has Code

    Please try the below tsql and see if this helps - 

    ;WITH CTE AS (
    SELECT
      D.ID,
      D.name,
      MIN(D.pn) pn --You can change it to MAX() if required otherwise
    FROM
      @Dummy D
    GROUP BY
      D.ID,D.name  
    )
    
    SELECT
      D.*
    FROM
      @Dummy D
    INNER JOIN CTE
      ON CTE.ID = D.ID
        AND CTE.name = D.name
        AND CTE.pn = D.pn
    ORDER BY
      apt     


    -Vinay Pugalia
    If a post answers your question, please click "Mark As Answer" on that post or "Vote as Helpful".
    Web : Inkey Solutions
    Blog : My Blog
    Email : Vinay Pugalia

  • Saturday, August 11, 2012 6:12 AM
     
     Answered Has Code
    Declare @Dummy Table ( ID  int, pn int, name NVARCHAR(30), apt char(1) )
    insert into @Dummy values (12345655, 274679300, 'Will Smith', 'A')
    insert into @Dummy values (12345655, 274679300, 'Will Smith', 'B')
    insert into @Dummy values (12345655, 958407896, 'Will Smith', 'C')
    insert into @Dummy values (12345655, 958407896, 'Will Smith', 'D')
    insert into @Dummy values (12345655, 358407896, 'Will Smith', 'E')
    insert into @Dummy values (12345655, 358407896, 'Will Smith', 'F')
    insert into @Dummy values (12345655, 358407896, 'Will Smith', 'G')
    insert into @Dummy values (12345655, 658407896, 'Will Smith', 'H')
    insert into @Dummy values (12345655, 862493644, 'michael douglas', 'E')
    insert into @Dummy values (12345655, 862493644, 'michael douglas ', 'F')
    insert into @Dummy values (12345655, 864051416, 'michael douglas', 'H')
    insert into @Dummy values (12345655, 864051416, 'michael douglas', 'L')
    insert into @Dummy values (12345655, 864051416, 'michael douglas', 'G')
    insert into @Dummy values (42345655, 764051400, 'Steven wang', 'H')
    insert into @Dummy values (42345655, 764051400, 'Steven wang', 'L')
    insert into @Dummy values (42345655, 764051409, 'Steven wang', 'G')
    insert into @Dummy values (42345655, 764051409, 'Steven wang', 'K')
    insert into @Dummy values (42345655, 764050000, 'Steven wang', 'M')
    insert into @Dummy values (72345655, 564050000, 'Sandra V O', 'X')
    insert into @Dummy values (72345655, 564050999, 'Sandra V O', 'Y')
    
    ;With CTE
    As
    (
    Select *, Count(pn) Over(Partition BY ID, pn) As MyCount
    From  @Dummy
    )
    ,CTE2
    As
    (
    Select	*
    , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc) As MyRank
    , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc, PN) As MyRank2
    From	CTE
    --Order by ID, MyCount Desc,  MyRank2
    )
    
    Delete From CTE2 Where MyRank + MyRAnk2 > 2;
    
    Select	*
    From	@Dummy;


    View Steven Wang's profile on LinkedIn | Blog: MSBICOE.com | MCITP - BI, SQL Developer & DBA

    Hate to mislead others, if I'm wrong slap me. Thanks!

    • Marked As Answer by sandra V O Monday, August 13, 2012 8:39 AM
    •  
  • Monday, August 13, 2012 8:21 AM
     
      Has Code

    @ dgjohnson, unable to thank you for being supportive in this thread and providing consistent solutions. 

    @  vinaypugalia: thanks but it did not give the desired results

    @ Steven,

    This is an outside the box solution, which demonstrate mastery in T-SQL.  However, I would like to store the unwanted/dubs in a temp table. Then I would like to pass the unwanted values to many tables in the DB.

    Furthermore, I would like to learn from you the thought process behind your coding. Please, explain to us the code for learning purposes.


    Declare @Dummy Table ( ID  int, pn int, name NVARCHAR(30), apt char(1) )
    insert into @Dummy values (12345655, 274679300, 'Will Smith', 'A')
    insert into @Dummy values (12345655, 274679300, 'Will Smith', 'B')
    insert into @Dummy values (12345655, 958407896, 'Will Smith', 'C')
    insert into @Dummy values (12345655, 958407896, 'Will Smith', 'D')
    insert into @Dummy values (12345655, 358407896, 'Will Smith', 'E')
    insert into @Dummy values (12345655, 358407896, 'Will Smith', 'F')
    insert into @Dummy values (12345655, 358407896, 'Will Smith', 'G')
    insert into @Dummy values (12345655, 658407896, 'Will Smith', 'H')
    insert into @Dummy values (12345655, 862493644, 'michael douglas', 'E')
    insert into @Dummy values (12345655, 862493644, 'michael douglas ', 'F')
    insert into @Dummy values (12345655, 864051416, 'michael douglas', 'H')
    insert into @Dummy values (12345655, 864051416, 'michael douglas', 'L')
    insert into @Dummy values (12345655, 864051416, 'michael douglas', 'G')
    insert into @Dummy values (42345655, 764051400, 'Steven wang', 'H')
    insert into @Dummy values (42345655, 764051400, 'Steven wang', 'L')
    insert into @Dummy values (42345655, 764051409, 'Steven wang', 'G')
    insert into @Dummy values (42345655, 764051409, 'Steven wang', 'K')
    insert into @Dummy values (42345655, 764050000, 'Steven wang', 'M')
    insert into @Dummy values (72345655, 564050000, 'Sandra V O', 'X')
    insert into @Dummy values (72345655, 564050999, 'Sandra V O', 'Y')
    
    Select * from @Dummy order by name 
    
    ;With CTE
    As
    (
    Select *, Count(pn) Over(Partition BY ID, pn) As MyCount
    From  @Dummy
    )
    ,CTE2
    As
    (
    Select	*
    , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc) As MyRank
    , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc, PN) As MyRank2
    From	CTE
    --Order by ID, MyCount Desc,  MyRank2
    )
    
    --Delete From CTE2 Where MyRank + MyRAnk2 > 2;
    
    --Select	*
    --From	@Dummy;
    
    
    Select  ID, pn, name, apt  from CTE2 Where MyRank + MyRAnk2 > 2 order by name

    I'll be testing the code against DBand I'll let you know about the results

    Thanks.


    • Edited by sandra V O Monday, August 13, 2012 8:28 AM
    •  
  • Monday, August 13, 2012 9:41 AM
     
     Answered Has Code
    /*
    Hi Sandra,
    
    First, I would like to say that you are indeed a very good questioner who can explain
    a problem clearly. This is really good, particularly you have provided some example
    for us to work with.
    
    You have stated 5 different cases in your further reply and when I transform those to
    the T-sql solution, it becomes much simpler as when you mentioned if the count of records
    are same then you don't care which group of records to delete.
    
    You said that I would like to store the unwanted/dubs in a temp table. Then I would like to 
    pass the unwanted values to many tables in the DB. This can be done with the delete statement
    with an OUTPUT clause, I will explain more in the code below.
    
    Indeed, I think that my solution can further simplied to only use the second rank function.
    as explained.
    
    */
    
    Declare @Dummy Table ( ID  int, pn int, name NVARCHAR(30), apt char(1) );
    insert into @Dummy values (12345655, 274679300, 'Will Smith', 'A')
    , (12345655, 274679300, 'Will Smith', 'B')
    , (12345655, 958407896, 'Will Smith', 'C')
    , (12345655, 958407896, 'Will Smith', 'D')
    , (12345655, 358407896, 'Will Smith', 'E')	--keep bacause of high occurance
    , (12345655, 358407896, 'Will Smith', 'F')	--keep bacause of high occurance
    , (12345655, 358407896, 'Will Smith', 'G')	--keep bacause of high occurance
    , (12345655, 658407896, 'Will Smith', 'H')
    , (12345655, 862493644, 'michael douglas', 'E')
    , (12345655, 862493644, 'michael douglas ', 'F')
    , (12345655, 864051416, 'michael douglas', 'H')	--keep bacause of high occurance
    , (12345655, 864051416, 'michael douglas', 'L')	--keep bacause of high occurance
    , (12345655, 864051416, 'michael douglas', 'G')	--keep bacause of high occurance
    , (42345655, 764051400, 'Steven wang', 'H')	--keep bacause of high occurance but as it as same count with the 'G' and 'K', it might keep other 2 rows instead
    , (42345655, 764051400, 'Steven wang', 'L')	--keep bacause of high occurance but as it as same count with the 'G' and 'K', it might keep other 2 rows instead
    , (42345655, 764051409, 'Steven wang', 'G')
    , (42345655, 764051409, 'Steven wang', 'K')
    , (42345655, 764050000, 'Steven wang', 'M')
    , (72345655, 564050000, 'Sandra V O', 'X')	--keep as 'X' and 'Y' have same count (1 rows). Depending on the query it might keep 'Y'
    , (72345655, 564050999, 'Sandra V O', 'Y');
    
    --Since you want to keep the deleted rows into a temp table for further use
    --You first create a blank Temp table with same table structure as your Dummy
    --table by using Top 0 clause and into. this will create a blank temp table
    
    Select Top 0 * 
    Into	#MyTemp
    From	@Dummy; 
    
    
    ;With CTE
    As
    (
    Select	*
    		, Count(pn) Over(Partition BY ID, pn) As MyCount
    		--You first count the smallest granular you concern which is Partioned by ID then PN
    From  @Dummy
    )
    ,CTE2
    As
    (
    Select	*
    , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc, PN) As MyRank
    --As from your explanation that 1 ID might have mutiple names, one name might have multiple PN
    --the Hierarchy sets as: ID go down to Name go down to PN
    --you need to use the dense rank function to find out which PN has the highest count number.
    --since you used the rank function, all PNs under a Name if they have the same highest MyCount, 
    --it will ranked same as 1.
    --since there are more than 2 PN groups could have same highest count of rows, you use a second
    --order column PN to break the rank with same count based on the PN
    From	CTE
    
    )
    --based on the rank function, you only want to keep the Myrank value = 1 rows
    --therfore you delete all rows with rank > 2.
    --Since you want to keep the deleted rows for further analysis, you ourput those
    --records with the OUTPUT clause to a temp table you created before CTE
    
    Delete From CTE2 
    Output Deleted.Id, Deleted.Pn, Deleted.Name, Deleted.apt Into #MyTemp
    Where MyRank > 1;
    
    Select	*
    From	@Dummy;
    
    
    Select *
    From	#MyTemp;
    
    --perform you code here to use the #MyTemp Table
    
    Drop Table #MyTemp;
    

    I hope my reply answers your questions and made you more clear on these codes. I'm not sure which part of the world you are in, if by chance you are in New Zealand, I will have a full session at TechEd in Auckland in September. Thanks


    View Steven Wang's profile on LinkedIn | Blog: MSBICOE.com | MCITP - BI, SQL Developer & DBA

    Hate to mislead others, if I'm wrong slap me. Thanks!


  • Monday, August 13, 2012 7:18 PM
     
     

    @ Steven,

    Excellent. I am very pleased with the code and explanation. Well done. I am located in in USA :). Thanks



    • Edited by sandra V O Monday, August 13, 2012 7:19 PM
    • Edited by sandra V O Tuesday, August 14, 2012 7:41 PM
    •