Puzzle #1
-
Thursday, August 09, 2012 7:14 PM
Tools:SQL Management Studio 2008
Environment: SQL Server 2008 and Windows 7
Problem: I am trying to list duplicate records in table called temp, for example,will smith has two set of records: 012345655, 274679300, 'Will Smith', 'A' and 012345655, 274679300, 'Will Smith', 'B'. I want to remove one set of records from the dummy table (either one). However, for a reason, my code lists 012345655, 274679300, 'Will Smith', 'C' and 012345655, 958407896, 'Will Smith', 'B'. Both of these records have different pin numbers and has not been treated as set of records. The result is wrong. I wonder how I would list the desired results. I am curious how you may resolve this issue.
Declare @Dummy Table ( ID int, pn int, name NVARCHAR(30), apt char(1) ) insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A') insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D') insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E') insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L') select * from @Dummy -- 8 rows ;with cteDups as ( Select ID ,pn ,name ,apt ,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn From @Dummy ) Select * From cteDups Where rn = 1 order by name
The following are wrong results
ID pn name apt rn
12345655 862493644 michael douglas E 1
12345655 864051416 michael douglas H 1
12345655 958407896 Will Smith C 1
12345655 274679300 Will Smith A 1And the those are wrong too
ID pn name apt cnt 12345655 274679300 Will Smith A 2 12345655 274679300 Will Smith B 2 12345655 862493644 michael douglas E 2 12345655 862493644 michael douglas F 2 12345655 864051416 michael douglas H 2 12345655 864051416 michael douglas L 2 12345655 958407896 Will Smith C 2 12345655 958407896 Will Smith D 2
Desired results Must be
12345655 862493644 michael douglas E 12345655 862493644 michael douglas F 12345655 958407896 Will Smith C 12345655 958407896 Will Smith D
- Edited by sandra V O Thursday, August 09, 2012 8:21 PM
- Edited by sandra V O Thursday, August 09, 2012 8:30 PM
- Edited by sandra V O Thursday, August 09, 2012 8:31 PM
- Edited by sandra V O Tuesday, August 14, 2012 8:20 PM
All Replies
-
Thursday, August 09, 2012 7:32 PM
E.g.
declare @Dummy table ( ID int, pn int, name NVARCHAR(255), apt CHAR(1) ); insert into @dummy values (012345655, 274679300, 'Will Smith', 'A'); insert into @dummy values (012345655, 274679300, 'Will Smith', 'B'); insert into @dummy values (012345655, 958407896, 'Will Smith', 'C'); insert into @dummy values (012345655, 958407896, 'Will Smith', 'D'); insert into @dummy values (012345655, 862493644, 'michael douglas', 'E'); insert into @dummy values (012345655, 862493644, 'michael douglas ', 'F'); insert into @dummy values (012345655, 864051416, 'michael douglas', 'H'); insert into @dummy values (012345655, 864051416, 'michael douglas', 'L'); insert into @dummy values (012345655, 958407812, 'Will Smith', 'X'); insert into @dummy values (012345655, 862493612, 'michael douglas', 'Y'); select * from @dummy; with dubs as ( select *, count(*) over (partition by id, pn, name) as cnt from @Dummy ) select * from dubs where cnt > 1;
Please post correct scripts :)
- Edited by Stefan HoffmannMVP Thursday, August 09, 2012 7:35 PM
- Edited by Stefan HoffmannMVP Thursday, August 09, 2012 7:36 PM
-
Thursday, August 09, 2012 7:38 PM
I think your problem is your partition by clause. Because you are only partitioning by the ID, that is all that gets checked. Also, based on the data/structures given, I got a bunch of errors. Here is something a bit more generic, but it gets the duplicates:
--create table @Dummy ( ID int, pn int, name NVARCHAR, apt int ) Declare @Dummy Table ( ID int, pn int, name NVARCHAR(30), apt char(1) ) insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A') insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D') insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E') insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L') select * from @Dummy -- 4 rows ;with cteDups as ( Select ID ,pn ,name ,apt ,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn From @Dummy ) Select * From cteDups Where rn = 1
- Marked As Answer by sandra V O Monday, August 13, 2012 8:41 AM
-
Thursday, August 09, 2012 8:23 PM
Thanks for the prompt response.I did revise my question in the top
But the results are not what am looking for
I would like to see the results:
12345655 274679300 Will Smith A 12345655 274679300 Will Smith B 12345655 864051416 michael douglas H 12345655 864051416 michael douglas L OR 12345655 862493644 michael douglas E 12345655 862493644 michael douglas F 12345655 958407896 Will Smith C 12345655 958407896 Will Smith D
- Edited by sandra V O Thursday, August 09, 2012 8:24 PM
- Edited by sandra V O Thursday, August 09, 2012 8:29 PM
- Edited by sandra V O Thursday, August 09, 2012 8:29 PM
- Edited by sandra V O Thursday, August 09, 2012 8:32 PM
-
Thursday, August 09, 2012 8:37 PM
Not sure why you want it that way? But here you go:
--create table @Dummy ( ID int, pn int, name NVARCHAR, apt int ) Declare @Dummy Table ( ID int, pn int, name NVARCHAR(30), apt char(1) ) insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A') insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D') insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E') insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L') select * from @Dummy -- 4 rows ;with cteDups as ( Select ID ,pn ,name ,apt ,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn From @Dummy ) ,cteGrp as ( Select ID ,pn ,name ,apt ,rn ,ROW_NUMBER() OVER(Partition By Name Order By name) - rn grp From cteDups ) Select ID ,pn ,name ,apt From cteGrp Where grp = (Select Min(grp) From cteGrp)
- Marked As Answer by sandra V O Thursday, August 09, 2012 9:10 PM
- Unmarked As Answer by sandra V O Thursday, August 09, 2012 10:44 PM
- Marked As Answer by sandra V O Monday, August 13, 2012 8:40 AM
-
Thursday, August 09, 2012 8:48 PM
Excellent. Thanks for the prompt response. It works as I wished. Would you please explain what you have done to obtain these results. I dont understand the following code line
grp = (Select Min(grp) From cteGrp)
ROW_NUMBER() OVER(Partition By Name Order By name) - rn grp
- Edited by sandra V O Thursday, August 09, 2012 8:59 PM
-
Thursday, August 09, 2012 9:25 PM
The second ROW_NUMBER() clause gets a distinct group value, based on Name. If you add the RN and GRP columns and remove the where clause from the final select you will see:
ID pn name apt rn grp ----------- ----------- ------------------------------ ---- -------------------- -------------------- 12345655 862493644 michael douglas E 1 0 12345655 862493644 michael douglas F 2 0 12345655 864051416 michael douglas H 1 2 12345655 864051416 michael douglas L 2 2 12345655 958407896 Will Smith C 1 0 12345655 958407896 Will Smith D 2 0 12345655 274679300 Will Smith A 1 2 12345655 274679300 Will Smith B 2 2
The actual value of that grp column doesn't matter much, just the fact that it pulls the different sets together in the manner you want. To get that second ROW_NUMBER() clause I actually stole (and slightly modified) it from a common gaps and islands solution. And then you only want one of the sets (in your case, it didn't matter) so the where clause will pull the 0 group. You could just as easily switch it to max to get the 2 group.
-
Thursday, August 09, 2012 10:09 PM
Very clever. Thanks.
However, when I listed the following SQL statement
insert into @Dummy values (012345655, 864051416, 'michael douglas', 'x') insert into @Dummy values (123456789, 351871971, 'Jacjsons5', 'Q') insert into @Dummy values (123456789, 113653162, 'Jaksons5', 'w')
Your code does list Jacksons5 as duplicate record
The following were the results
ID pn name apt 12345655 274679300 Will Smith A 12345655 274679300 Will Smith B 12345655 958407896 Will Smith C 12345655 958407896 Will Smith D 12345655 862493644 michael douglas E 12345655 862493644 michael douglas F 12345655 864051416 michael douglas H 12345655 864051416 michael douglas L 12345655 864051416 michael douglas x 123456789 351871971 Jacjsons5 Q 123456789 113653162 Jaksons5 w ID pn name apt 123456789 351871971 Jacjsons5 Q 123456789 113653162 Jaksons5 w 12345655 862493644 michael douglas E 12345655 862493644 michael douglas F 12345655 958407896 Will Smith C 12345655 958407896 Will Smith D
The results should be:
12345655 864051416 michael douglas x 12345655 862493644 michael douglas E 12345655 862493644 michael douglas F 12345655 958407896 Will Smith C 12345655 958407896 Will Smith D
because those are duplicate
Please advise
- Edited by sandra V O Thursday, August 09, 2012 10:50 PM
- Edited by sandra V O Thursday, August 09, 2012 10:58 PM
- Edited by sandra V O Thursday, August 09, 2012 10:58 PM
-
Friday, August 10, 2012 1:21 PM
If you take a little bit from Steffan's solution and add a COUT(*) OVER clause, you can make sure you are getting only those records that have more than one. However, for your specific resultset you have to switch to using a MAX on the grp column. You always want to show the rows that have the most duplicates?
;with cteDups as ( Select ID ,pn ,name ,apt ,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn From @Dummy ) ,cteGrp as ( Select ID ,pn ,name ,apt ,rn ,ROW_NUMBER() OVER(Partition By Name Order By name) - rn grp ,COUNT(*) OVER(Partition By ID, pn) cnt From cteDups ) Select ID ,pn ,name ,apt ,rn ,grp ,cnt From cteGrp Where grp = (Select Max(grp) From cteGrp) And cnt > 1
Edit:
Also, while I was doing this I thought about what happens if you don't have the same set of duplicates? So if you only have 1 group of Will Smith duplicates but two Michael Douglas and 5 Jeremy Renner? In that case, you would only see the Jeremy Renner group, since the max grp will be 5. If you use this select statement, however, it fixes that, and you use the max grp based on the person:
Select ID ,pn ,name ,apt ,rn ,grp ,cnt From cteGrp c1 Where grp = (Select Max(c2.grp) From cteGrp c2 Where c1.name = c2.name) And cnt > 1
- Edited by dgjohnson Friday, August 10, 2012 1:23 PM
-
Friday, August 10, 2012 9:44 PM
I would like to express my sincere gratitude for being patient and taking the time to help me out. It mean a lot. I really like the thought process behind the code.I spent many trying to understand it because I can modify it and adjust it if more case scenarios appeared.
I implemented this code
Select ID ,pn ,name ,apt ,rn ,grp ,cnt From cteGrp c1 Where grp = (Select Max(c2.grp) From cteGrp c2 Where c1.name = c2.name) And cnt > 1
Then, I added a few records to test out the code
insert into @Dummy values (012345655, 274679300, 'Will Smith', 'A') insert into @Dummy values (012345655, 274679300, 'Will Smith', 'B') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'C') insert into @Dummy values (012345655, 958407896, 'Will Smith', 'D') insert into @Dummy values (012345655, 862493644, 'michael douglas', 'E') insert into @Dummy values (012345655, 862493644, 'michael douglas ', 'F') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'H') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'L') insert into @Dummy values (012345655, 864051416, 'michael douglas', 'x') insert into @Dummy values (123456789, 351871971, 'Jacjsons5', 'Q') insert into @Dummy values (123456789, 113653162, 'Jaksons5', 'w') insert into @Dummy values (123456789, 113653162, 'Jaksons5', 'M')
Results of the code is
123456789 113653162 Jaksons5 w 1 0 2 123456789 113653162 Jaksons5 M 2 0 2 12345655 864051416 michael douglas H 1 2 3 12345655 864051416 michael douglas L 2 2 3 12345655 864051416 michael douglas x 3 2 3 12345655 274679300 Will Smith A 1 2 2 12345655 274679300 Will Smith B 2 2 2
But the desired one should be
123456789 351871971 Jacjsons5 Q 12345655 864051416 michael douglas H 1 2 3 12345655 864051416 michael douglas L 2 2 3 12345655 864051416 michael douglas x 3 2 3 12345655 274679300 Will Smith A 1 2 2 12345655 274679300 Will Smith B 2 2 2
Why? we have case scenarios
Case 1 is resolved by T-SQL
12345655 274679300 Will Smith A
12345655 274679300 Will Smith B
12345655 958407896 Will Smith C
12345655 958407896 Will Smith DWe should Delete
12345655 958407896 Will Smith C
12345655 958407896 Will Smith Dor
12345655 274679300 Will Smith A
12345655 274679300 Will Smith BCase 2 is not resolved yet
12345655 862493644 michael douglas E
12345655 862493644 michael douglas F
12345655 864051416 michael douglas H
12345655 864051416 michael douglas L
12345655 864051416 michael douglas XWe should Delete
12345655 862493644 michael douglas E
12345655 862493644 michael douglas F
Case 3 is not resolved yet
123456789 351871971 Jacjsons5 Q
123456789 113653162 Jaksons5 w
123456789 113653162 Jaksons5 M
We should Delete
123456789 351871971 Jacjsons5 Q
Case 4
700662481 113653162 Bruce Wells y
700662481 351871971 Bruce wells zWe should Delete
Either
700662481 351871971 Bruce wells yOR
700662481 113653162 Bruce Wells z
Case 5 is resolved
94403738 997550985 Jackie Shan v
NO Action to Delete
Case 6 is resolved by T-SQL
55654321 173670000 Tommy A
55654321 173670000 Tommy BNO Action to Delete
I look forward to hearing from you.
- Edited by sandra V O Friday, August 10, 2012 9:45 PM
- Edited by sandra V O Friday, August 10, 2012 10:17 PM
- Edited by sandra V O Friday, August 10, 2012 10:25 PM
- Edited by sandra V O Friday, August 10, 2012 10:47 PM
-
Friday, August 10, 2012 9:50 PM
I'm on my way out, but will revisit later. In the meantime, what is the business rule here? You want to keep the duplicates? It seems to me that you would WANT to delete
123456789 113653162 Jaksons5 w
123456789 113653162 Jaksons5 Minstead of
123456789 351871971 Jacjsons5 Q
And I am not sure why you want to remove the Bruce Wells at all, since the ID and pn are the same. Is it based more on name, than ID and pn combination? Such that it's okay if you have multiples with the same name, so long as the ID and pn also match?
Edit: I see now that Bruce Wells entries share the distinct ID/pn combinations from the Jaksons5, which makes sense why you want to delete them, but I still do not understand why you want to delete the set of two Jaksons5 instead of just the single.- Edited by dgjohnson Friday, August 10, 2012 9:58 PM
-
Friday, August 10, 2012 10:15 PM
Thanks for the prompt response
The business rule by looking at the five case scenarios is
1. if the pin number is repeated more than one and has the max number of repetition, keep these records and delete other. These cases appear in Case 2 and case 3
a .Case two is not resolved yet
12345655 862493644 michael douglas E
12345655 862493644 michael douglas F
12345655 864051416 michael douglas H
12345655 864051416 michael douglas L
12345655 864051416 michael douglas XWe should Delete
12345655 862493644 michael douglas E
12345655 862493644 michael douglas Fb. Case three is not resolved yet
123456789 351871971 Jacjsons5 Q
123456789 113653162 Jaksons5 w
123456789 113653162 Jaksons5 MWe should Delete
123456789 351871971 Jacjsons5 Q
2. if pin number is repeated twice keep one set and delete the other set.
Case one is resolved by T-SQL
12345655 274679300 Will Smith A
12345655 274679300 Will Smith B
12345655 958407896 Will Smith C
12345655 958407896 Will Smith DWe should Delete
12345655 958407896 Will Smith C
12345655 958407896 Will Smith Dor
12345655 274679300 Will Smith A
12345655 274679300 Will Smith B3. If the pin is repeated twice , keep it - no action
Case 6 is resolved by T-SQL
55654321 173670000 Tommy A
55654321 173670000 Tommy BNO ACTION to Delete
4.if the person has same ID but different pin , delete one of the pin. Other wise, if person has unique ID and unique pin, no action
Case 4
700662481 113653162 Bruce Wells y
700662481 351871971 Bruce wells zWe should Delete
Either
700662481 351871971 Bruce wells yOR
700662481 113653162 Bruce Wells z
Case 5 is resolved
94403738 997550985 Jackie Shan v
NO Action to Delete* Kindly, see bold font in the pin number to convey my message
I hope this is clear
- Edited by sandra V O Friday, August 10, 2012 10:20 PM
-
Saturday, August 11, 2012 4:04 AM
Okay, this should take care of everything but Case 4. I don't have time to look at that right now, but will try to pick it up again tomorrow.
;with cteDups as ( Select ID ,pn ,name ,apt ,ROW_NUMBER() OVER(Partition By ID, pn Order By ID) rn From @Dummy ) ,cteGrp as ( Select ID ,pn ,name ,apt ,rn ,ROW_NUMBER() OVER(Partition By Name, ID Order By name) - rn grp ,COUNT(*) OVER(Partition By ID, pn) cnt1 ,COUNT(*) OVER(Partition By ID) cnt2 From cteDups ) ,cteCase1 as ( Select ID ,pn ,name ,apt ,rn ,grp ,cnt1 ,cnt2 From cteGrp c1 Where grp = (Select Min(c2.grp) From cteGrp c2 Where c1.name = c2.name) --And cnt > 1 ) ,cteCase2 as ( Select ID ,pn ,name ,apt ,rn ,grp ,cnt1 ,cnt2 From cteGrp c1 Where cnt1 = 1 And cnt2 > 1 ) Select ID ,pn ,name ,apt From cteCase1 c1 Where Not Exists (Select Null From cteCase2 c2 Where c1.Name = c2.Name) UNION ALL Select ID ,pn ,name ,apt From cteCase2
-
Saturday, August 11, 2012 5:40 AM
Please try the below tsql and see if this helps -
;WITH CTE AS ( SELECT D.ID, D.name, MIN(D.pn) pn --You can change it to MAX() if required otherwise FROM @Dummy D GROUP BY D.ID,D.name ) SELECT D.* FROM @Dummy D INNER JOIN CTE ON CTE.ID = D.ID AND CTE.name = D.name AND CTE.pn = D.pn ORDER BY apt
-Vinay Pugalia
If a post answers your question, please click "Mark As Answer" on that post or "Vote as Helpful".
Web : Inkey Solutions
Blog : My Blog
Email : Vinay Pugalia -
Saturday, August 11, 2012 6:12 AM
Declare @Dummy Table ( ID int, pn int, name NVARCHAR(30), apt char(1) ) insert into @Dummy values (12345655, 274679300, 'Will Smith', 'A') insert into @Dummy values (12345655, 274679300, 'Will Smith', 'B') insert into @Dummy values (12345655, 958407896, 'Will Smith', 'C') insert into @Dummy values (12345655, 958407896, 'Will Smith', 'D') insert into @Dummy values (12345655, 358407896, 'Will Smith', 'E') insert into @Dummy values (12345655, 358407896, 'Will Smith', 'F') insert into @Dummy values (12345655, 358407896, 'Will Smith', 'G') insert into @Dummy values (12345655, 658407896, 'Will Smith', 'H') insert into @Dummy values (12345655, 862493644, 'michael douglas', 'E') insert into @Dummy values (12345655, 862493644, 'michael douglas ', 'F') insert into @Dummy values (12345655, 864051416, 'michael douglas', 'H') insert into @Dummy values (12345655, 864051416, 'michael douglas', 'L') insert into @Dummy values (12345655, 864051416, 'michael douglas', 'G') insert into @Dummy values (42345655, 764051400, 'Steven wang', 'H') insert into @Dummy values (42345655, 764051400, 'Steven wang', 'L') insert into @Dummy values (42345655, 764051409, 'Steven wang', 'G') insert into @Dummy values (42345655, 764051409, 'Steven wang', 'K') insert into @Dummy values (42345655, 764050000, 'Steven wang', 'M') insert into @Dummy values (72345655, 564050000, 'Sandra V O', 'X') insert into @Dummy values (72345655, 564050999, 'Sandra V O', 'Y') ;With CTE As ( Select *, Count(pn) Over(Partition BY ID, pn) As MyCount From @Dummy ) ,CTE2 As ( Select * , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc) As MyRank , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc, PN) As MyRank2 From CTE --Order by ID, MyCount Desc, MyRank2 ) Delete From CTE2 Where MyRank + MyRAnk2 > 2; Select * From @Dummy;
|
Blog: MSBICOE.com |
MCITP - BI, SQL Developer & DBA
Hate to mislead others, if I'm wrong slap me. Thanks!
- Marked As Answer by sandra V O Monday, August 13, 2012 8:39 AM
-
Monday, August 13, 2012 8:21 AM
@ dgjohnson, unable to thank you for being supportive in this thread and providing consistent solutions.
@ vinaypugalia: thanks but it did not give the desired results
@ Steven,
This is an outside the box solution, which demonstrate mastery in T-SQL. However, I would like to store the unwanted/dubs in a temp table. Then I would like to pass the unwanted values to many tables in the DB.
Furthermore, I would like to learn from you the thought process behind your coding. Please, explain to us the code for learning purposes.
Declare @Dummy Table ( ID int, pn int, name NVARCHAR(30), apt char(1) ) insert into @Dummy values (12345655, 274679300, 'Will Smith', 'A') insert into @Dummy values (12345655, 274679300, 'Will Smith', 'B') insert into @Dummy values (12345655, 958407896, 'Will Smith', 'C') insert into @Dummy values (12345655, 958407896, 'Will Smith', 'D') insert into @Dummy values (12345655, 358407896, 'Will Smith', 'E') insert into @Dummy values (12345655, 358407896, 'Will Smith', 'F') insert into @Dummy values (12345655, 358407896, 'Will Smith', 'G') insert into @Dummy values (12345655, 658407896, 'Will Smith', 'H') insert into @Dummy values (12345655, 862493644, 'michael douglas', 'E') insert into @Dummy values (12345655, 862493644, 'michael douglas ', 'F') insert into @Dummy values (12345655, 864051416, 'michael douglas', 'H') insert into @Dummy values (12345655, 864051416, 'michael douglas', 'L') insert into @Dummy values (12345655, 864051416, 'michael douglas', 'G') insert into @Dummy values (42345655, 764051400, 'Steven wang', 'H') insert into @Dummy values (42345655, 764051400, 'Steven wang', 'L') insert into @Dummy values (42345655, 764051409, 'Steven wang', 'G') insert into @Dummy values (42345655, 764051409, 'Steven wang', 'K') insert into @Dummy values (42345655, 764050000, 'Steven wang', 'M') insert into @Dummy values (72345655, 564050000, 'Sandra V O', 'X') insert into @Dummy values (72345655, 564050999, 'Sandra V O', 'Y') Select * from @Dummy order by name ;With CTE As ( Select *, Count(pn) Over(Partition BY ID, pn) As MyCount From @Dummy ) ,CTE2 As ( Select * , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc) As MyRank , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc, PN) As MyRank2 From CTE --Order by ID, MyCount Desc, MyRank2 ) --Delete From CTE2 Where MyRank + MyRAnk2 > 2; --Select * --From @Dummy; Select ID, pn, name, apt from CTE2 Where MyRank + MyRAnk2 > 2 order by name
I'll be testing the code against DBand I'll let you know about the results
Thanks.
- Edited by sandra V O Monday, August 13, 2012 8:28 AM
-
Monday, August 13, 2012 9:41 AM
/* Hi Sandra, First, I would like to say that you are indeed a very good questioner who can explain a problem clearly. This is really good, particularly you have provided some example for us to work with. You have stated 5 different cases in your further reply and when I transform those to the T-sql solution, it becomes much simpler as when you mentioned if the count of records are same then you don't care which group of records to delete. You said that I would like to store the unwanted/dubs in a temp table. Then I would like to pass the unwanted values to many tables in the DB. This can be done with the delete statement with an OUTPUT clause, I will explain more in the code below. Indeed, I think that my solution can further simplied to only use the second rank function. as explained. */ Declare @Dummy Table ( ID int, pn int, name NVARCHAR(30), apt char(1) ); insert into @Dummy values (12345655, 274679300, 'Will Smith', 'A') , (12345655, 274679300, 'Will Smith', 'B') , (12345655, 958407896, 'Will Smith', 'C') , (12345655, 958407896, 'Will Smith', 'D') , (12345655, 358407896, 'Will Smith', 'E') --keep bacause of high occurance , (12345655, 358407896, 'Will Smith', 'F') --keep bacause of high occurance , (12345655, 358407896, 'Will Smith', 'G') --keep bacause of high occurance , (12345655, 658407896, 'Will Smith', 'H') , (12345655, 862493644, 'michael douglas', 'E') , (12345655, 862493644, 'michael douglas ', 'F') , (12345655, 864051416, 'michael douglas', 'H') --keep bacause of high occurance , (12345655, 864051416, 'michael douglas', 'L') --keep bacause of high occurance , (12345655, 864051416, 'michael douglas', 'G') --keep bacause of high occurance , (42345655, 764051400, 'Steven wang', 'H') --keep bacause of high occurance but as it as same count with the 'G' and 'K', it might keep other 2 rows instead , (42345655, 764051400, 'Steven wang', 'L') --keep bacause of high occurance but as it as same count with the 'G' and 'K', it might keep other 2 rows instead , (42345655, 764051409, 'Steven wang', 'G') , (42345655, 764051409, 'Steven wang', 'K') , (42345655, 764050000, 'Steven wang', 'M') , (72345655, 564050000, 'Sandra V O', 'X') --keep as 'X' and 'Y' have same count (1 rows). Depending on the query it might keep 'Y' , (72345655, 564050999, 'Sandra V O', 'Y'); --Since you want to keep the deleted rows into a temp table for further use --You first create a blank Temp table with same table structure as your Dummy --table by using Top 0 clause and into. this will create a blank temp table Select Top 0 * Into #MyTemp From @Dummy; ;With CTE As ( Select * , Count(pn) Over(Partition BY ID, pn) As MyCount --You first count the smallest granular you concern which is Partioned by ID then PN From @Dummy ) ,CTE2 As ( Select * , Dense_Rank() Over(Partition By ID, Name Order By MyCount Desc, PN) As MyRank --As from your explanation that 1 ID might have mutiple names, one name might have multiple PN --the Hierarchy sets as: ID go down to Name go down to PN --you need to use the dense rank function to find out which PN has the highest count number. --since you used the rank function, all PNs under a Name if they have the same highest MyCount, --it will ranked same as 1. --since there are more than 2 PN groups could have same highest count of rows, you use a second --order column PN to break the rank with same count based on the PN From CTE ) --based on the rank function, you only want to keep the Myrank value = 1 rows --therfore you delete all rows with rank > 2. --Since you want to keep the deleted rows for further analysis, you ourput those --records with the OUTPUT clause to a temp table you created before CTE Delete From CTE2 Output Deleted.Id, Deleted.Pn, Deleted.Name, Deleted.apt Into #MyTemp Where MyRank > 1; Select * From @Dummy; Select * From #MyTemp; --perform you code here to use the #MyTemp Table Drop Table #MyTemp;
I hope my reply answers your questions and made you more clear on these codes. I'm not sure which part of the world you are in, if by chance you are in New Zealand, I will have a full session at TechEd in Auckland in September. Thanks
|
Blog: MSBICOE.com |
MCITP - BI, SQL Developer & DBA Hate to mislead others, if I'm wrong slap me. Thanks!
- Edited by Steven Wang - Shangzhou Monday, August 13, 2012 9:44 AM
- Marked As Answer by sandra V O Monday, August 13, 2012 7:13 PM
-
Monday, August 13, 2012 7:18 PM
@ Steven,
Excellent. I am very pleased with the code and explanation. Well done. I am located in in USA :). Thanks
- Edited by sandra V O Monday, August 13, 2012 7:19 PM
- Edited by sandra V O Tuesday, August 14, 2012 7:41 PM

