Pertanyaan SQL Help?

  • 03 Mei 2012 21:22
     
     

    I have one table with inconsistency data.parituclarly one Contact Name column with multiple entries for one user.

    Id  base_id  contact name

    1    222       AAA  bbb CCC

    2    222       AAA  bbb CCC

    3    222       AAA  bbb CCC

    4    222       AAA  bbb CCC

    5   222        AAA  bbb CCC

    6    222       zzz   YYY  XXX

    7    222       AAA  bbb CCC

    8    222       AAA  bbb CCC

    For  above sample data I need to find out duplicate guy zzz  YYY  XXX. because 222 base_id is only for this person AAA  bbb CCC.in this table I have 300 thousand records.

    How to write select query to find  duplicate records

    Thanks

Semua Balasan

  • 03 Mei 2012 21:52
     
      Memiliki Kode

    pls try

    select base_id,contact,count(1)ff from
    (
    select 1 id,222 base_id , 'AAA bbb CCC' contact union
    select 2,222, 'AAA bbb CCC' union
    select 3,222, 'AAA bbb CCC' union
    select 4,222, 'AAA bbb CCC' union 
    select 5,222, 'AAA bbb CCC' union
    select 6,222, 'zzz YYY XXX' union
    select 7,222, 'AAA bbb CCC' union
    select 8,222, 'AAA bbb CCC'
    ) rs
    group by base_id,contact having COUNT(1)=1
    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

  • 03 Mei 2012 21:52
     
     

    I have far from sure what you are asking for. But here is one way to find the duplicates:

    SELECT base_id, contact_name, COUNT(*)
    FROM   tbl
    GROUP  BY base_id, contact_name
    HAVING COUNT(*) > 1


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • 03 Mei 2012 22:01
     
     

    CREATE TABLE dup
    (id int,base_id int,contact_name varchar(25))

    insert into dup

    select 1    222      'AAA  bbb CCC'

    select 2    222       'AAA  bbb CCC'

    select 3    222       'AAA  bbb CCC'

    select 4    222       'AAA  bbb CCC'

    select 5   222        'AAA  bbb CCC'

    select 6    222       'zzz   YYY  XXX'

    select 7    222      'AAA  bbb CCC'

    select 8    222       'AAA  bbb CCC'



    SELECT * FROM dup

    SELECT COUNT(*) as counts,contact_name FROM dup
    group by contact_name
    HAVING count(*)=1


  • 03 Mei 2012 22:14
     
      Memiliki Kode

    You can also write it in a way that doesn't expect there to be only one dupe in which case you would want to get the one which a lower frequency of occurrence.  Keep in mind the drawback of this approach is if there is a tie in occurrence, it arbitrarily picks one.

    ;with freq as(
    	select
    		base_id
    	,	contactname
    	,	row_number() over(partition by base_id order by count(*) desc) as ranking
    	from tbl
    	group by base_id, contactname
    )
    select
    	t.*
    from tbl t
    where exists(
    	select 1
    	from freq
    	where freq.base_id = t.base_id
    	and freq.contactname = t.contactname
    	and freq.ranking > 1
    )


    • Diedit oleh Zhenny 03 Mei 2012 22:14 clarifying
    •  
  • 04 Mei 2012 2:30
     
     

    Thanks narasimha thats working fine for me.but some time in name column if we have any typo miskaes like AAA bb CCC form above query it will give this record also duplicates.I want only one duplicate for 222 base_id.How can i modify the same query.

    Thanks

  • 04 Mei 2012 2:39
     
     

    Apply upper for the column like

    SELECT COUNT(*) as counts,upper(contact_name) FROM dup
    group by upper(contact_name)
    HAVING count(*)=1;

    go


    Many Thanks & Best Regards, Hua Min


  • 04 Mei 2012 3:12
     
     

    Still I am getting two  duplicates but i have only one...

    Thanks

  • 04 Mei 2012 3:57
    Moderator
     
     
    3 bbb is not the same as 2 bb. I don't think a simple method exists to not consider this row unique and not the same as aaa bbb ccc.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

  • 04 Mei 2012 4:02
     
     
    3 bbb is not the same as 2 bb. I don't think a simple method exists to not consider this row unique and not the same as aaa bbb ccc.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    Absolutely correct Naomi...The requirement seems to be confusing here....

    Murali Krishnan

  • 04 Mei 2012 5:01
     
      Memiliki Kode

    Please try the below tsql - 

    DECLARE @T TABLE (id INT,base_id INT,contact VARCHAR(100))
    INSERT INTO @T
    select 1 id,222 base_id , 'AAA bbb CCC' contact union
    select 2,222, 'AAA bbb CCC' union
    select 3,222, 'AAA bbb CCC' union
    select 4,222, 'AAA bbb CCC' union 
    select 5,222, 'AAA bbb CCC' union
    select 6,222, 'zzz YYY XXX' union
    select 7,222, 'AAA bbb CCC' union
    select 8,222, 'AAA bbb CCC'
    
    
    --From the below query, first check the all possible TYPO
    SELECT 
      DISTINCT Contact 
    FROM 
      @T
    WHERE
      base_id = 222
    
    --The below is the query you need
    DELETE FROM @T
    WHERE
      base_id = 222
      AND contact NOT IN ('AAA bbb CCC','All values you want to keep')
      
    --You data after deletion
    SELECT * FROM @T  
        
      
    

    Hope, this helps!


    -Vinay Pugalia
    If a post answers your question, please click "Mark As Answer" on that post or "Vote as Helpful".
    Web : Inkey Solutions
    Blog : My Blog
    Email : Vinay Pugalia

  • 04 Mei 2012 13:48
     
     

    1    222       AAA  bbb CCC

    2    222       AAA  bbb CCC

    3    222       AAA  bbb

    4    222       AAA  bbb CCC

    5   222        AAA  CCC

    6    222       zzz   YYY  XXX

    7    222       AAA  bbb CCC

    8    222       AAA  bbb CCC

    for above case also i want only one name as duplicate. any name  doesn't match first 5 lettres i want get only those duplicate names...how to write select statement for this?

    Thanks 

  • 04 Mei 2012 13:50
    Moderator
     
     

    In this case:

    ;with cte as (select *, count(*) over (partition by left(name,5)) as cntDups from myTable)

    select * from cte where cntDups = 1 -- first 5 letters are unique


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

  • 04 Mei 2012 15:23
     
     

    There must be some information you're not sharing with us?  As in, how do you know it's the entry for  'zzz yyy xxx' that is in error, and not 'aaa bbb ccc'? 

    If you have some other table with BASE-ID and NAME data that tells you 'aaa bbb ccc' is the correct name, then your problem is something else entirely ("Select distinct BASE_ID, NAME from <Inconsistency_table I> INNER JOIN <LOOKUP_TABLE L> on L.Base_ID = I.Base_ID WHERE L.Name <> I.Name").

    Otherwise, you have to take a step back, and say "If a BASE_ID references more than one name, there's a problem with Base_ID 222", and you should list both.