none
Cumulative sum partition , more efficient way? RRS feed

  • Question

  • I want to do a cumulative total partitioned by Hosp , i am struggling with filtering out the Numerator and retaining the correct denominator 

    This is what I am thinking , is there a more efficient way?

              

    With CTE AS (
    Select 
    F.Hosp,
    Case When TIME_TO_SURG BETWEEN 0 AND 720 Then '<12'End As TimeToSurgery,
    SUM(COUNT(MRN_ENC))OVER(Partition by F.Hosp,Case When TIME_TO_SURG BETWEEN 0 AND 720 Then '<12'End) As Numerator, 
            SUM(COUNT(MRN_ENC))OVER(Partition by F.Hosp) As Denominator
    From [IHFD].[FactPatientJourneyApr2018]F
    Where
    Adm_Operation >0 
    Group by F.Hosp,  Case When TIME_TO_SURG BETWEEN 0 AND 720 Then '<12'End
    Union All 
    Select 
    F.Hosp, 
    Case When TIME_TO_SURG BETWEEN 0 AND 1440 Then '<24'End As TimeToSurgery,
    SUM(COUNT(MRN_ENC))OVER(Partition by F.Hosp,Case When TIME_TO_SURG BETWEEN 0 AND 1440 Then '<24' End) As Numerator,
            SUM(COUNT(MRN_ENC))OVER(Partition by F.Hosp) As Denominator

    From [IHFD].[FactPatientJourneyApr2018]F
    Where
    Adm_Operation >0 
    Group by   F.Hosp,Case When TIME_TO_SURG BETWEEN 0 AND 1440 Then '<24'End
    )

    Select * From CTE Where TimeToSurgery Is not null

    • Edited by Carlo L Tuesday, August 13, 2019 2:41 PM
    Tuesday, August 13, 2019 2:30 PM

Answers

  • Hi Carlo L,

     

    Thank you for your issue.

     

    It's hard to say which method is better. It might be difficult for me to know your indexes or some other environment factors, but according to the data you provided, so far your first solution works very well. Of the above methods, the first one works well.

     

    I think you need to experiment with your actual data and you might find something new.

     

    By the way, please try to compare their performance against the actual execution plan. Also, you can 'SET STATISTICS IO ON; ' to display information regarding the amount of disk activity generated. Please compare the messages and you will see the difference. For more details , you can refer to it: https://docs.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Carlo L Wednesday, August 14, 2019 9:40 AM
    Wednesday, August 14, 2019 8:58 AM

All replies

  • To get a correct query, you need to mark up a sample table and data to represent your question with DDL and inserts.

    Tell us what your are expecting from your sample table and data with the query.

    We don't have these tables from your question so it is hard to help.

    Tuesday, August 13, 2019 3:43 PM
    Moderator
  •  

    Wondering if there is a better way of doing this

    Numerator -- This illustrates the Sum of Surgeries within <12 hr period by Hosptial

    Denominator -- this illustrates the Sum of Surgeries by Hospital

    Issue:

    If I use a filter in the where clause to limit my result set to <12 hrs , this will capture the correct numerator but not the denominator 

    If I use a case statement to identify the Patients within 12 hr period , this will also return all the patients regardless of time to surgery , (this is a cumulative sum) This will get me my Numerator and denominator but will also return Null for patients outside the 12 hrs 

    Why not just use a case statement to define each patients time to surgery , this is a cumulative sum;

    Solutions

     1) Use a CTE table (as below) and just filter out where the Numerator is not NULL

     2) Use a filter in the where clause to limit my result set to <12 hrs, and join a CTE to define the Denominator 

    Tables

    CREATE TABLE [dbo].[TheTable](
    [Hosp] [nvarchar](255) NULL,
    [Adm_Operation] [int] NULL,
    [TIME_TO_SURG] [int] NOT NULL
    ) ON [PRIMARY]
    GO
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0726', 10, 599)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0922', 10, 804)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0910', 4, 508)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0600', 10, 701)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0923', 10, 816)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0500', 1, 797)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0724', 10, 702)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0500', 10, 745)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0724', 3, 488)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0203', 12, 227)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'1270', 10, 754)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'1270', 1, 865)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0108', 10, 857)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0724', 10, 847)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0600', 10, 730)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0910', 3, 838)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0908', 4, 624)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0923', 4, 296)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0910', 4, 703)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0108', 4, 694)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0724', 10, 858)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0724', 10, 827)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0908', 3, 691)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'1270', 10, 651)
    INSERT [dbo].[TheTable] ([Hosp], [Adm_Operation], [TIME_TO_SURG]) VALUES (N'0910', 2, 417)

    Solution 1) 

    With CTE AS (
    Select 
    F.Hosp,
    Case When TIME_TO_SURG BETWEEN 0 AND 720 Then '<12'End As TimeToSurgery,
    SUM(COUNT(1))OVER(Partition by F.Hosp,Case When TIME_TO_SURG BETWEEN 0 AND 720 Then '<12'End) As Numerator, 
            SUM(COUNT(1))OVER(Partition by F.Hosp) As Denominator
    From dbo.TheTable f
    Where
    Adm_Operation >0 
    Group by F.Hosp,  Case When TIME_TO_SURG BETWEEN 0 AND 720 Then '<12'End  
    )

    Select * From CTE Where TimeToSurgery Is not null --Remove not NULL to see what I mean

    Solution 2) 

    With CTE AS (

      Select Hosp,
      Count(*)as HospitalTotal 
      From [dbo].[TheTable] 
      Group By
      Hosp
            )
     
    Select 
    F.Hosp,
    '<12'As TimeToSurgery,
    SUM(COUNT(1))OVER(Partition by F.Hosp) As Numerator, 
                   (Select C.HospitalTotal From CTE C where C.hosp = F.hosp) As Denominator
    From dbo.TheTable f
    Where
    Adm_Operation >0 and
    TIME_TO_SURG BETWEEN 0 AND 720
    Group by F.Hosp

    Wednesday, August 14, 2019 7:50 AM
  • I look two query execution plan. Sql server return 1 solution best without any index. After index may be performance changes. I think following solution best,but compare your first solution cost was big my solution. Finally in this sitution your first solution best.

    my solution

    ;With CTE AS (
    
      Select Hosp,
      Count(*)as HospitalTotal 
      From [dbo].[TheTable] 
      Group By
      Hosp
            )
     ,ctea
     as
     (
    Select 
    F.Hosp,
    '<12'As TimeToSurgery,
    COUNT(1) As Numerator
    From dbo.TheTable f
    Where
    Adm_Operation >0 and
    TIME_TO_SURG BETWEEN 0 AND 720
    Group by F.Hosp)
    
    
    
    select * from ctea a left join cte b on a.hosp=b.hosp
    

    Wednesday, August 14, 2019 8:38 AM
  • Hi Carlo L,

     

    Thank you for your issue.

     

    It's hard to say which method is better. It might be difficult for me to know your indexes or some other environment factors, but according to the data you provided, so far your first solution works very well. Of the above methods, the first one works well.

     

    I think you need to experiment with your actual data and you might find something new.

     

    By the way, please try to compare their performance against the actual execution plan. Also, you can 'SET STATISTICS IO ON; ' to display information regarding the amount of disk activity generated. Please compare the messages and you will see the difference. For more details , you can refer to it: https://docs.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by Carlo L Wednesday, August 14, 2019 9:40 AM
    Wednesday, August 14, 2019 8:58 AM
  • Thanks!
    Wednesday, August 14, 2019 9:40 AM