SQL Help?
-
03 Mei 2012 21:22
I have one table with inconsistency data.parituclarly one Contact Name column with multiple entries for one user.
Id base_id contact name
1 222 AAA bbb CCC
2 222 AAA bbb CCC
3 222 AAA bbb CCC
4 222 AAA bbb CCC
5 222 AAA bbb CCC
6 222 zzz YYY XXX
7 222 AAA bbb CCC
8 222 AAA bbb CCC
For above sample data I need to find out duplicate guy zzz YYY XXX. because 222 base_id is only for this person AAA bbb CCC.in this table I have 300 thousand records.
How to write select query to find duplicate records
Thanks
Semua Balasan
-
03 Mei 2012 21:52
pls try
select base_id,contact,count(1)ff from ( select 1 id,222 base_id , 'AAA bbb CCC' contact union select 2,222, 'AAA bbb CCC' union select 3,222, 'AAA bbb CCC' union select 4,222, 'AAA bbb CCC' union select 5,222, 'AAA bbb CCC' union select 6,222, 'zzz YYY XXX' union select 7,222, 'AAA bbb CCC' union select 8,222, 'AAA bbb CCC' ) rs group by base_id,contact having COUNT(1)=1
vt
Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker
-
03 Mei 2012 21:52
-
03 Mei 2012 22:01
CREATE TABLE dup
(id int,base_id int,contact_name varchar(25))
insert into dupselect 1 222 'AAA bbb CCC'
select 2 222 'AAA bbb CCC'
select 3 222 'AAA bbb CCC'
select 4 222 'AAA bbb CCC'
select 5 222 'AAA bbb CCC'
select 6 222 'zzz YYY XXX'
select 7 222 'AAA bbb CCC'
select 8 222 'AAA bbb CCC'
SELECT * FROM dup
SELECT COUNT(*) as counts,contact_name FROM dup
group by contact_name
HAVING count(*)=1- Diedit oleh Naarasimha 03 Mei 2012 22:02
-
03 Mei 2012 22:14
You can also write it in a way that doesn't expect there to be only one dupe in which case you would want to get the one which a lower frequency of occurrence. Keep in mind the drawback of this approach is if there is a tie in occurrence, it arbitrarily picks one.
;with freq as( select base_id , contactname , row_number() over(partition by base_id order by count(*) desc) as ranking from tbl group by base_id, contactname ) select t.* from tbl t where exists( select 1 from freq where freq.base_id = t.base_id and freq.contactname = t.contactname and freq.ranking > 1 )
- Diedit oleh Zhenny 03 Mei 2012 22:14 clarifying
-
04 Mei 2012 2:30
Thanks narasimha thats working fine for me.but some time in name column if we have any typo miskaes like AAA bb CCC form above query it will give this record also duplicates.I want only one duplicate for 222 base_id.How can i modify the same query.
Thanks
-
04 Mei 2012 2:39
Apply upper for the column like
SELECT COUNT(*) as counts,upper(contact_name) FROM dup
group by upper(contact_name)
HAVING count(*)=1;go
Many Thanks & Best Regards, Hua Min
- Diedit oleh HuaMin ChenMicrosoft Community Contributor 04 Mei 2012 2:42
-
04 Mei 2012 3:12
Still I am getting two duplicates but i have only one...
Thanks
-
04 Mei 2012 3:57Moderator3 bbb is not the same as 2 bb. I don't think a simple method exists to not consider this row unique and not the same as aaa bbb ccc.
For every expert, there is an equal and opposite expert. - Becker's Law
My blog -
04 Mei 2012 4:02
3 bbb is not the same as 2 bb. I don't think a simple method exists to not consider this row unique and not the same as aaa bbb ccc.
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
Absolutely correct Naomi...The requirement seems to be confusing here....Murali Krishnan
-
04 Mei 2012 5:01
Please try the below tsql -
DECLARE @T TABLE (id INT,base_id INT,contact VARCHAR(100)) INSERT INTO @T select 1 id,222 base_id , 'AAA bbb CCC' contact union select 2,222, 'AAA bbb CCC' union select 3,222, 'AAA bbb CCC' union select 4,222, 'AAA bbb CCC' union select 5,222, 'AAA bbb CCC' union select 6,222, 'zzz YYY XXX' union select 7,222, 'AAA bbb CCC' union select 8,222, 'AAA bbb CCC' --From the below query, first check the all possible TYPO SELECT DISTINCT Contact FROM @T WHERE base_id = 222 --The below is the query you need DELETE FROM @T WHERE base_id = 222 AND contact NOT IN ('AAA bbb CCC','All values you want to keep') --You data after deletion SELECT * FROM @T
Hope, this helps!
-Vinay Pugalia
If a post answers your question, please click "Mark As Answer" on that post or "Vote as Helpful".
Web : Inkey Solutions
Blog : My Blog
Email : Vinay Pugalia -
04 Mei 2012 13:48
1 222 AAA bbb CCC
2 222 AAA bbb CCC
3 222 AAA bbb
4 222 AAA bbb CCC
5 222 AAA CCC
6 222 zzz YYY XXX
7 222 AAA bbb CCC
8 222 AAA bbb CCC
for above case also i want only one name as duplicate. any name doesn't match first 5 lettres i want get only those duplicate names...how to write select statement for this?
Thanks
-
04 Mei 2012 13:50Moderator
In this case:
;with cte as (select *, count(*) over (partition by left(name,5)) as cntDups from myTable)
select * from cte where cntDups = 1 -- first 5 letters are unique
For every expert, there is an equal and opposite expert. - Becker's Law
My blog -
04 Mei 2012 15:23
There must be some information you're not sharing with us? As in, how do you know it's the entry for 'zzz yyy xxx' that is in error, and not 'aaa bbb ccc'?
If you have some other table with BASE-ID and NAME data that tells you 'aaa bbb ccc' is the correct name, then your problem is something else entirely ("Select distinct BASE_ID, NAME from <Inconsistency_table I> INNER JOIN <LOOKUP_TABLE L> on L.Base_ID = I.Base_ID WHERE L.Name <> I.Name").
Otherwise, you have to take a step back, and say "If a BASE_ID references more than one name, there's a problem with Base_ID 222", and you should list both.