none
Running total distinct question RRS feed

  • Question

  • Hi I have a table T with data like this

    Date CustId ProdName ProdVersion

    11/2  c1       m1            1
    11/2  c2       m1            1
    11/2  c3        m1           1
    11/3  c6       m1            1
    11/3  c4       m1            1 
    11/3  c5       m1            1 
    11/4  c6       m1            1 
    11/4  c4       m1            1 

    I want to know the running distinct Customers till each day. The output should be like this

    Date CustId ProdName ProdVersion   Total

    11/2  c1       m1            1                  1
    11/2  c2       m1            1                  2
    11/2  c3       m1            1                  3
    11/3  c6       m1            1                  4
    11/3  c4       m1            1                  5
    11/3  c5       m1            1                  6
    11/4  c6       m1            1                  6
    11/4  c4       m1            1                  6

    I tried dense_rank and then pick the max rankId it did not work.

    What did I miss?

    -Niel

    Wednesday, December 6, 2017 10:45 PM

Answers

  • This may do what you want

    Declare @Sample Table (Date date, CustId varchar(10), ProdName varchar(20), ProdVersion int)
    Insert @Sample(Date, CustId, ProdName, ProdVersion) Values
    ('11/2/2017',  'c1',       'm1',            1),
    ('11/2/2017',  'c2',       'm1',            1),
    ('11/2/2017',  'c3',       'm1',           1),
    ('11/3/2017',  'c6',       'm1',            1),
    ('11/3/2017',  'c4',       'm1',            1), 
    ('11/3/2017',  'c5',       'm1',            1), 
    ('11/4/2017',  'c6',       'm1',            1),
    ('11/4/2017',  'c4',       'm1',            1);
    ;With cte As
    (Select Date, CustId, ProdName, ProdVersion,
      Case When Row_Number() Over(Partition By CustID Order By Date) = 1 Then 1 Else 0 End As FirstOccurance
    From @Sample)
    Select Date, CustId, ProdName, ProdVersion, Sum(FirstOccurance) Over(Order By Date, CustID) As Total
    From cte
    Order By Date, CustId;
     
    Tom
    • Proposed as answer by Xi Jin Thursday, December 7, 2017 5:42 AM
    • Marked as answer by pituachMVP, Moderator Monday, February 5, 2018 11:10 PM
    Thursday, December 7, 2017 2:57 AM
  • Hi Niel,

    Sum() Over() is the best choice to achieve your requirement. You can also use DENSE_RANK() before it.

    create table #test 
    (
    [Date] date,
    CustId varchar(10),
    ProdName varchar(10),
    ProdVersion int
    )
    
    insert into #test values
    ('2017-11-02','c1','m1',1),
    ('2017-11-02','c2','m1',1),
    ('2017-11-02','c3','m1',1),
    ('2017-11-03','c6','m1',1),
    ('2017-11-03','c4','m1',1),
    ('2017-11-03','c5','m1',1),
    ('2017-11-04','c6','m1',1),
    ('2017-11-05','c4','m1',1),
    ('2017-11-06','c7','m1',1),
    ('2017-11-07','c7','m1',1)
    
    ;with cte as(
    select Date,CustId,ProdName,ProdVersion,
    case when DENSE_RANK()Over(Partition By CustId Order By [Date])=1 then 1 else 0 end AS rk
    from #test)
    
    select Date,CustId,ProdName,ProdVersion,Sum(rk)Over(Order By [Date],CustId) as Total
    from cte

    Reference: Window Function Examples for SQL Server 

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Niel1 Thursday, December 7, 2017 7:15 AM
    Thursday, December 7, 2017 6:29 AM

All replies

  • How did it not work? You'll need to provide DDL, DML and what you've tried so far
    Thursday, December 7, 2017 12:36 AM
  • This may do what you want

    Declare @Sample Table (Date date, CustId varchar(10), ProdName varchar(20), ProdVersion int)
    Insert @Sample(Date, CustId, ProdName, ProdVersion) Values
    ('11/2/2017',  'c1',       'm1',            1),
    ('11/2/2017',  'c2',       'm1',            1),
    ('11/2/2017',  'c3',       'm1',           1),
    ('11/3/2017',  'c6',       'm1',            1),
    ('11/3/2017',  'c4',       'm1',            1), 
    ('11/3/2017',  'c5',       'm1',            1), 
    ('11/4/2017',  'c6',       'm1',            1),
    ('11/4/2017',  'c4',       'm1',            1);
    ;With cte As
    (Select Date, CustId, ProdName, ProdVersion,
      Case When Row_Number() Over(Partition By CustID Order By Date) = 1 Then 1 Else 0 End As FirstOccurance
    From @Sample)
    Select Date, CustId, ProdName, ProdVersion, Sum(FirstOccurance) Over(Order By Date, CustID) As Total
    From cte
    Order By Date, CustId;
     
    Tom
    • Proposed as answer by Xi Jin Thursday, December 7, 2017 5:42 AM
    • Marked as answer by pituachMVP, Moderator Monday, February 5, 2018 11:10 PM
    Thursday, December 7, 2017 2:57 AM
  • >> I have a table T with data like this <<

    You asked what you missed. The first thing was basic netiquette; did you read the part about posting DDL. It's at the front of every SQL forum I posted on the last 30 years? How did you miss it? Did you know that a table has to have a key? Did you know that the only format for a date allowed in ANSI ISO standard SQL is based on ISO 8601? We need a "<something in particular>_date", and not a vague generic date. Did you know that DATE is an actual data type in SQL? Did you know a table has to have a key? First, let's fix that (and do  not be this rude again).

    CREATE TABLE Foobar_Sales
    (sale_date DATE NOT NULL,
     cust_id CHAR(2)NOT NULL,
     PRIMARY KEY (sale_date, cust_id, prod_name, prod_version_nbr),
     prod_name CHAR(2)NOT NULL,
     prod_version_nbr INTEGER NOT NULL
      CHECK (prod_version_nbr > 0)
    ); 

    INSERT INTO Foobar_Sales
    ('2017-11-02', 'c1', 'm1', 1),
    ('2017-11-02', 'c2', 'm1', 1),
    ('2017-11-02', 'c3', 'm1', 1),
    ('2017-11-03', 'c6', 'm1', 1),
    ('2017-11-03', 'c4', 'm1', 1),
    ('2017-11-03', 'c5', 'm1', 1),
    ('2017-11-04', 'c6', 'm1', 1),
    ('2017-11-04', 'c4', 'm1', 1),

    I want to know the running distinct Customers till each day. The output should be like this

    Date cust_id prod_name ProdVersion Total
    2017-11-02 c1  'm1' 1 1
    2017-11-02 c2  'm1' 1 2
    2017-11-02 c3  'm1' 1 3
    2017-11-03 c6  'm1' 1 4
    2017-11-03 c4  'm1' 1 5
    2017-11-03 c5  'm1' 1 6
    2017-11-04 c6  'm1' 1 6
    2017-11-04 c4  'm1' 1 6

    >> I tried dense_rank and then pick the max rankId it did not work. <<

    You're going to have to use a windowed function, with rows and partitioning. This is valid ANSI/ISO standard SQL but I'm not sure if Microsoft supports a COUNT(DISTINCT) option yet

    SELECT sale_date, cust_id, prod_name, prod_version_nbr,
           COUNT(DISTINCT cust_id)
           OVER (PARTITION BY cust_id
                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
           AS unique_cust_cnt_running_tot
      FROM Foobar_Sales;

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    • Marked as answer by Niel1 Thursday, December 7, 2017 7:16 AM
    • Unmarked as answer by pituachMVP, Moderator Monday, February 5, 2018 11:10 PM
    Thursday, December 7, 2017 3:16 AM
  • Hi Niel,

    Sum() Over() is the best choice to achieve your requirement. You can also use DENSE_RANK() before it.

    create table #test 
    (
    [Date] date,
    CustId varchar(10),
    ProdName varchar(10),
    ProdVersion int
    )
    
    insert into #test values
    ('2017-11-02','c1','m1',1),
    ('2017-11-02','c2','m1',1),
    ('2017-11-02','c3','m1',1),
    ('2017-11-03','c6','m1',1),
    ('2017-11-03','c4','m1',1),
    ('2017-11-03','c5','m1',1),
    ('2017-11-04','c6','m1',1),
    ('2017-11-05','c4','m1',1),
    ('2017-11-06','c7','m1',1),
    ('2017-11-07','c7','m1',1)
    
    ;with cte as(
    select Date,CustId,ProdName,ProdVersion,
    case when DENSE_RANK()Over(Partition By CustId Order By [Date])=1 then 1 else 0 end AS rk
    from #test)
    
    select Date,CustId,ProdName,ProdVersion,Sum(rk)Over(Order By [Date],CustId) as Total
    from cte

    Reference: Window Function Examples for SQL Server 

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Niel1 Thursday, December 7, 2017 7:15 AM
    Thursday, December 7, 2017 6:29 AM