locked
Distinct count with multiple columns RRS feed

  • Question

  • Hello All,

    how to get the distinct count with multiple columns in select and group by statements. Without columns i am getting the different count, with multiple columns getting huge number count. whenever count the distinct member for data, if the same member is repeating in same month or future months i need to be considered as only one time. can you please help to how to avoid the duplicate counts with number of columns.

    year month id 

    2017  01    345

    2017 02     345

    2017  02    346

    2017 03     345

    2017 03     346

    2017 03     347

    ---final out put:

    year month id 

    2017  01    345


    2017  02    346



    2017 03     347

    Thanks,

    CMK

    Tuesday, June 4, 2019 2:52 PM

All replies

  • Hello,

    Study 'WITH X AS' and build what you need to get with 2-3 simple steps, each of which you can understand.

    And... I can't understand why on source data

    2017 03     345
    2017 03     346
    2017 03     347

    the result of groping suppose to be 

    2017 03     347

    Do you select MAX(ID) as result for third column?


    Sincerely, Highly skilled coding monkey.

    Tuesday, June 4, 2019 3:09 PM
  • It seems what you are looking for can easily be achieved with analytical functions. Looking at your data, it can be partitioned into sub groups based on Year/Month and then you can calculate number of rows and latest id's

    Here is the sample query, that shows how partition by clause can be leveraged.

    CREATE TABLE year_month
    ([year] int, [month] int, id int);
    
    INSERT INTO dbo.year_month ([year],[month],id)
    VALUES
    (2017,01,345),(2017,02,345),(2017,02,346),(2017,03,345),(2017,03,346),(2017,03,347);
    
    SELECT 
        *,
        COUNT(*) over(PARTITION BY [year],[month]) NumRecords,
        MAX(id) over(PARTITION BY [year],[month]) MaxId
    FROM year_month;

    Here is the output, highlighted in different colored groups. Look at the two columns NumRecords and MaxId

    Hope this helps!

    Thanks,

    Lokesh Vij

     
    Tuesday, June 4, 2019 4:48 PM
  • Thanks lokesh for your reply! I need small changes like  below output. 

    year month totalcount

    2017 1        1

    2017 2       1

    2017 3       1


    Tuesday, June 4, 2019 7:05 PM
  • Hi cmk1,

     

    Please try following script.

     

    By the way, if you have solve your issue ,in order to close this thread, please kindly mark helpful replies or your own reply as answers. By doing so, it will benefit all community members who are having this similar issue.  Your contribution is highly appreciated.

     
    CREATE TABLE year_month
    ([year] int, [month] int, id int);
    
    INSERT INTO dbo.year_month ([year],[month],id)
    VALUES
    (2017,01,345),(2017,02,345),(2017,02,346),(2017,03,345),(2017,03,346),(2017,03,347);
    
    ;with cte as (
    SELECT *,row_number()over(partition by [year],[month] order by id desc) as rn 
    FROM year_month)
    select [year],[month],id
    from cte where rn=1
    /*
    year        month       id
    ----------- ----------- -----------
    2017        1           345
    2017        2           346
    2017        3           347
    */
    
    ;with cte as (
    SELECT *,row_number()over(partition by [year],[month] order by id desc) as rn 
    FROM year_month)
    ,cte1 as (
    select [year],[month],id
    from cte where rn=1)
    select  [year],[month],count(id) as totalcount 
    from cte1 
    group by  [year],[month]
    /*
    year        month       totalcount
    ----------- ----------- -----------
    2017        1           1
    2017        2           1
    2017        3           1
    */

    Hope it will help you.

     

    Best Regards,

    Rachel

     


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, June 5, 2019 6:06 AM
  • Thanks lokesh for your reply! I need small changes like  below output. 

    year month totalcount

    2017 1        1

    2017 2       1

    2017 3       1


    Here it is:

    WITH tCTE AS
    (
        SELECT 
            *,
            COUNT(*) over(PARTITION BY [year],[month],id) NumRecords,
            DENSE_RANK() over(PARTITION BY [year],[month] order by id desc) Rnk
        FROM year_month
    )
    SELECT [year],[month],NumRecords
    FROM tCTE 
    WHERE Rnk = 1;

    I used a dense rank, thinking that for same year/month combination IDs can repeat (duplicate). So you will get correct count of records.

    Thanks!

    Wednesday, June 5, 2019 11:19 AM
  • Hi cmk1,

    I am writing to follow up this thread with you. Have you solved your issue? If you have solved your issue ,in order to close this thread, please kindly mark helpful replies as answers. By doing so, it will benefit all community members who are having this similar issue.  Your contribution is highly appreciated.

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, June 28, 2019 9:50 AM