none
SQL Query to get Distinct ON Only One Column

    Question

  • I have a Insert query which select results from a SQL involve multiple joins.
    The select results multiple records, but I want to filter on a SINGLE Column - Distinct.

    How can I do this?

    Ex:

    Insert INTO Target(
     ID
     ,col2
     ,col3
     ,col4
     ,col5
     ,Col6
    )
    SELECT
      A.ID
     ,A.C1
     ,A.C2
     ,A.C3
     ,B.C1
     ,C.C5
    FROM Tab1 A
    INER JOIN Tab2 B ON A.C1 = B.C1
    INNER JOIN Tab3 C ON A.C2 = C.C2
    WHERE A.ID NOT IN (SELECT DISTINCT ID from Target)

    Here I want the Select should return DISTINCT ID ONLY. - Max of C3 Column record should be fine. (As C3 is a date column)
    If I am usging Distinct in select it apply on all columns, but I want only to apply on ID column.

    Please advise.

    Thanks,
    Prabhat

    Wednesday, August 25, 2010 2:37 PM

Answers

  • I think the join operations you have used causes the repeated rows ( rows that have the same value on the key columns ) get back to the result in spite of a DISTINCT operator.this happens because of other tables in the joins.

    My solution

    Imagine you have table or view with the following schema A(A1,A2,...Az) and you want to select rows with distinct value for columns A1,A2,A3,...Am. I think Window functions is the only way that you can achieve the desired result

    SELECT A1,A2,...Az

    FROM ( SELECT ROW_NUMBER() OVER( PARTITION BY A1,A2,...,Am [ ORDER BY

                           An,Ao,...Ar]) Row , A1,A2,...,Az

                  FROM A ) AS PartitionedResult

    WHERE Row = 1 -- Picking up the first row of each partition

    • Proposed as answer by Naomi NModerator Wednesday, August 25, 2010 3:52 PM
    • Marked as answer by Prabhat_IE Wednesday, August 25, 2010 3:57 PM
    Wednesday, August 25, 2010 3:02 PM
  • Try this Prabhat,

    WITH DS_SRC

    AS

    (

    SELECT

     ROW_NUMBER() OVER (PARTITION BY A.ID ORDER BY A.C3 DESC) AS R_NUM

     ,A.ID

     ,A.C1

     ,A.C2

     ,A.C3

     ,B.C1

     ,C.C5

    FROM Tab1 A

    INNER JOIN Tab2 B ON A.C1 = B.C1

    INNER JOIN Tab3 C ON A.C2 = C.C2

    WHERE A.ID NOT IN (SELECT DISTINCT ID from Target)

    )

    Insert INTO Target(

      ID

     ,col2

     ,col3

     ,col4

     ,col5

     ,Col6

    )

    SELECT

     A.ID

     ,A.C1

     ,A.C2

     ,A.C3

     ,B.C1

     ,C.C5

    FROM DS_SRC WHERE R_NUM=1

    Thanks,

    Piyurkumar !!

    • Marked as answer by Prabhat_IE Wednesday, August 25, 2010 3:56 PM
    Wednesday, August 25, 2010 3:07 PM
  • Try

    WITH cte AS 
    (SELECT 
     A.ID AS ID 
     ,A.C1 AS C1 
     ,A.C2 AS C2
     ,A.C3 AS C3 
     ,B.C1 AS C4 
     ,C.C5 AS C5 ,
     ROW_NUMBER() OVER (PARTITION BY A.ID ORDER BY C.C5 DESC)
    FROM Tab1 A
    INER JOIN Tab2 B ON A.C1 = B.C1
    INNER JOIN Tab3 C ON A.C2 = C.C2)
    
    Insert INTO Target(
     ID
     ,col2
     ,col3
     ,col4
     ,col5
     ,Col6
    )
    SELECT 
     c.ID
     ,c.C1
     ,c.C2
     ,c.C3
     ,c.C4
     ,c.C5
    FROM CTE c
    WHERE c.ID NOT IN (SELECT DISTINCT ID from Target)
    AND c.RN = 1 
    
    
    

    Regards, Deven ----------------------------------------- Please vote if you find any of my post helpful.
    • Proposed as answer by Naomi NModerator Wednesday, August 25, 2010 3:52 PM
    • Marked as answer by Prabhat_IE Wednesday, August 25, 2010 3:56 PM
    Wednesday, August 25, 2010 3:40 PM

All replies

  • Try this

    Insert INTO Target(
     ID
     ,col2
     ,col3
     ,col4
     ,col5
     ,Col6
    )
    SELECT 
     A.ID
     ,A.C1
     ,A.C2
     ,A.C3
     ,B.C1
     ,C.C5
    FROM Tab1 A
    JOIN (SELECT DISTINCT ID FROM Tab1 ) tbl ON A.ID = tbl.ID
    INER JOIN Tab2 B ON A.C1 = B.C1
    INNER JOIN Tab3 C ON A.C2 = C.C2
    WHERE A.ID NOT IN (SELECT DISTINCT ID from Target)
    

    Regards, Deven ----------------------------------------- Please vote if you find any of my post helpful.
    Wednesday, August 25, 2010 2:43 PM
  • If An record of Table A is mapped to more than 1 record of TableB, then which record u wants to display?

    In other words, , TableA has 1 Record with Id = 1 has its C1 Value "10"  now tableC has 2 records with C5 values as 100 and 200 and  those records "c1" value as "10" . in that situation you wants to show which C1 value??  100 or 200? what is the criteria?

    If u wants to show the first record only

    another thing is why u need to join with Tab2 when u are not using it any where? to just check those records needs to be there also? if nor remove that join, since u r fetching C1 only in Select list, which is same of C1 in Tab1 also...

    Its like When you are joining there will be 1 to Many Mapping will happens. when u need one-one, u need to specify the criteria for that other record.

    check the below links which will clearly tell that problem.

    http://blogs.lessthandot.com/index.php/DataMgmt/DBProgramming/MSSQLServer/including-an-aggregated-column-s-related

     http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/including-an-aggregated-column-s-related-2

     

     

    Wednesday, August 25, 2010 2:48 PM
  • Hi Deven, I tried similar query with RIGHT JOIN and also tried your query, but that did not work for me.

    - Prabhat
    Wednesday, August 25, 2010 2:57 PM
  • I think the join operations you have used causes the repeated rows ( rows that have the same value on the key columns ) get back to the result in spite of a DISTINCT operator.this happens because of other tables in the joins.

    My solution

    Imagine you have table or view with the following schema A(A1,A2,...Az) and you want to select rows with distinct value for columns A1,A2,A3,...Am. I think Window functions is the only way that you can achieve the desired result

    SELECT A1,A2,...Az

    FROM ( SELECT ROW_NUMBER() OVER( PARTITION BY A1,A2,...,Am [ ORDER BY

                           An,Ao,...Ar]) Row , A1,A2,...,Az

                  FROM A ) AS PartitionedResult

    WHERE Row = 1 -- Picking up the first row of each partition

    • Proposed as answer by Naomi NModerator Wednesday, August 25, 2010 3:52 PM
    • Marked as answer by Prabhat_IE Wednesday, August 25, 2010 3:57 PM
    Wednesday, August 25, 2010 3:02 PM
  • Try this Prabhat,

    WITH DS_SRC

    AS

    (

    SELECT

     ROW_NUMBER() OVER (PARTITION BY A.ID ORDER BY A.C3 DESC) AS R_NUM

     ,A.ID

     ,A.C1

     ,A.C2

     ,A.C3

     ,B.C1

     ,C.C5

    FROM Tab1 A

    INNER JOIN Tab2 B ON A.C1 = B.C1

    INNER JOIN Tab3 C ON A.C2 = C.C2

    WHERE A.ID NOT IN (SELECT DISTINCT ID from Target)

    )

    Insert INTO Target(

      ID

     ,col2

     ,col3

     ,col4

     ,col5

     ,Col6

    )

    SELECT

     A.ID

     ,A.C1

     ,A.C2

     ,A.C3

     ,B.C1

     ,C.C5

    FROM DS_SRC WHERE R_NUM=1

    Thanks,

    Piyurkumar !!

    • Marked as answer by Prabhat_IE Wednesday, August 25, 2010 3:56 PM
    Wednesday, August 25, 2010 3:07 PM
  • Can you please provide sample data to test. I am not sure if you want distinct on all the 5 columns or just distinct values of A.ID?

     


    Regards, Deven ----------------------------------------- Please vote if you find any of my post helpful.
    Wednesday, August 25, 2010 3:09 PM
  • Hi Deven,

    Sample data looks like this:

    C1  C2  C3  C4 C5 C6  C7
    1001 101  A  B C  30 2010-06-21 14:40:14.977
    1001 102  A  B C  30 2010-06-21 14:40:14.977
    1002 105  X  Y  Z  50 2010-07-21 14:40:14.977
    
    

    It should give one row from 1001 and 1002. The row selection can be max (C7), but there is a chance the C7 may be same for both row.
    So in that case any row from 1001 should be OK for me.

    I will try your example now again and see if I am missing anything. Also I will try example from PiyurKumar.

    Thanks.

    Wednesday, August 25, 2010 3:18 PM
  • Try

    WITH cte AS 
    (SELECT 
     A.ID AS ID 
     ,A.C1 AS C1 
     ,A.C2 AS C2
     ,A.C3 AS C3 
     ,B.C1 AS C4 
     ,C.C5 AS C5 ,
     ROW_NUMBER() OVER (PARTITION BY A.ID ORDER BY C.C5 DESC)
    FROM Tab1 A
    INER JOIN Tab2 B ON A.C1 = B.C1
    INNER JOIN Tab3 C ON A.C2 = C.C2)
    
    Insert INTO Target(
     ID
     ,col2
     ,col3
     ,col4
     ,col5
     ,Col6
    )
    SELECT 
     c.ID
     ,c.C1
     ,c.C2
     ,c.C3
     ,c.C4
     ,c.C5
    FROM CTE c
    WHERE c.ID NOT IN (SELECT DISTINCT ID from Target)
    AND c.RN = 1 
    
    
    

    Regards, Deven ----------------------------------------- Please vote if you find any of my post helpful.
    • Proposed as answer by Naomi NModerator Wednesday, August 25, 2010 3:52 PM
    • Marked as answer by Prabhat_IE Wednesday, August 25, 2010 3:56 PM
    Wednesday, August 25, 2010 3:40 PM
  • Thanks Deven,

    And also thanks to Amir_sarajan and Piyukumar too.
    As the CTE logic did the trick.

    Thanks all for the help.
    -- Prabhat

    Wednesday, August 25, 2010 3:59 PM