none
Looking for best way to continuously aggregate data from a log table and update totals RRS feed

  • Question

  • I have a log table that is written to by multiple field devices and continuously being updated.  What I want to do is query that table, produce an aggregate of hourly activity and write the aggregates to another table.  At first glance this is simple enough but the data does not come into the log table sequentially due to connectivity lag for devices in the field, meaning I could run aggregation now for everything I have, but in a couple of hours receive new records for say this morning.  I would need to be able to go back and update those previously aggregated records and add in the new records received.

    The aggregation is pretty simple.  It is just count of the # of log records received by hour (derived from a datetime column)  per the device and content columns in the raw log records.  

    Below is a sample script of tables.  I can't really change these columns too much, but I can add additional columns and introduce new columns, temp tables, CTE's etc. to come up with the best way to process this.

    --Raw log table

    CREATE TABLE [dbo].[ActivityLog](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [EventDateTime] [datetime] NULL,
    [DeviceName] [nvarchar](22) NULL,
    [ContentName] [nvarchar](240) NULL,
    )

    --Aggregated Log Records by hour per device and content

    CREATE TABLE [dbo].[AggregatedHrlyActivity](
    [DeviceName] [nvarchar](22) NULL,
    [ContentName] [nvarchar](240) NULL,
    [ActivityDate] [Date] NULL, -- Date Derived from [EventDateTime] of ActivityLog table
    [ActivityDay] [int] NULL,   -- Day Of The Month 0-31 Derived from [EventDateTime] of ActivityLog table
    [ActivityHour] [Date] NULL, -- Hour Of The Day 1-24 Derived from [EventDateTime] of ActivityLog table
    [NumPlays] [int] NULL,


    --Insert some sample log data into table
    INSERT INTO [dbo].[ActivityLog]
               ([EventDateTime]
               ,[DeviceName]
               ,[ContentName])
         VALUES
                (DATEADD(hh,-7,GETDATE()) ,'DeviceA', 'Content1')
               ,(DATEADD(hh,-7,GETDATE()) ,'DeviceB', 'Content1')
       ,(DATEADD(hh,-7,GETDATE()) ,'DeviceC', 'Content1')
       ,(DATEADD(hh,-7,GETDATE()) ,'DeviceA', 'Content2')
               ,(DATEADD(hh,-7,GETDATE()) ,'DeviceB', 'Content2')
       ,(DATEADD(hh,-6,GETDATE()) ,'DeviceA', 'Content1')
               ,(DATEADD(hh,-6,GETDATE()) ,'DeviceB', 'Content1')
       ,(DATEADD(hh,-5,GETDATE()) ,'DeviceC', 'Content1')
       ,(DATEADD(hh,-5,GETDATE()) ,'DeviceA', 'Content2')
       ,(DATEADD(hh,-5,GETDATE()) ,'DeviceC', 'Content1')
       ,(DATEADD(hh,-5,GETDATE()) ,'DeviceA', 'Content2')
               ,(DATEADD(hh,-4,GETDATE()) ,'DeviceB', 'Content2')
       ,(DATEADD(D,-6,GETDATE()) ,'DeviceA', 'Content2')
               ,(DATEADD(D,-6,GETDATE()) ,'DeviceB', 'Content2')
       ,(DATEADD(D,-6,GETDATE()) ,'DeviceC', 'Content3')
    GO

    --Sample Aggregation Output To Store in [AggregatedHrlyActivity] table. 

    --  **Note this method only works to write the data once and doesn't address updating existing aggregates as new "old" records come into the log table.
    SELECT
    [DeviceName]
    ,[ContentName]
    ,CAST([EventDateTime] as DATE) as [ActivityDate]
    ,DATEPART(D, [EventDateTime]) as [ActivityDay]
    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]
    ,COUNT(*)
    FROM
      [dbo].[ActivityLog]
    GROUP BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    ORDER BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    GO


    Tuesday, October 30, 2018 6:48 PM

Answers

All replies

  • See if this helps

    ---writes the data
    INSERT INTO AggregatedHrlyActivity
    SELECT
    [DeviceName]
    ,[ContentName]
    ,CAST([EventDateTime] as DATE) as [ActivityDate]
    ,DATEPART(D, [EventDateTime]) as [ActivityDay]
    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]
    ,COUNT(*)
    FROM
      [dbo].[ActivityLog]
    GROUP BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    ORDER BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    GO
    ------updates the existing data

    WITH cte
    AS
    (
    SELECT
    [DeviceName]
    ,[ContentName]
    ,CAST([EventDateTime] as DATE) as [ActivityDate]
    ,DATEPART(D, [EventDateTime]) as [ActivityDay]
    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]
    ,COUNT(*) cnt
    FROM
      [dbo].[ActivityLog]
    GROUP BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 

    )
     UPDATE AggregatedHrlyActivity SET DeviceName=[DeviceName],
     ...........
     FROM AggregatedHrlyActivity JOIN  cte ON 

     AggregatedHrlyActivity.[DeviceName]=cte.[DeviceName] 
     AND 
      
     AggregatedHrlyActivity.[ContentName]=cte.[ContentName] 
     WHERE............


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Wednesday, October 31, 2018 6:47 AM
    Moderator
  • Hi JasonDWilson77,

     

    According to your description, my understanding is that the records were inserted into ActivityLog table disorderly because of the connectivity lag and you want to continuously aggregate data from ActivityLog. If anything is misunderstood, please tell me.

     

    Could you please try to create a view.

     

    CREATE VIEW view2 AS

    SELECT

    [DeviceName]

    ,[ContentName]

    ,CAST([EventDateTime] as DATE) as [ActivityDate]

    ,DATEPART(D, [EventDateTime]) as [ActivityDay]

    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]

    ,COUNT(*) as count

    FROM

      [dbo].[ActivityLog]

    GROUP BY

    DeviceName

    ,ContentName

    ,CAST([EventDateTime] as DATE)

    ,DATEPART(D, [EventDateTime])

    ,DATEPART(hh, [EventDateTime])

     

    When you select the view, the results will be based on the latest records in ActivityLog table.

     

    Best Regards,

    Emily


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Wednesday, October 31, 2018 7:01 AM
  • Hi JasonDWilson77,

     

    According to your description, my understanding is that the records were inserted into ActivityLog table disorderly because of the connectivity lag and you want to continuously aggregate data from ActivityLog. If anything is misunderstood, please tell me.

     

    Could you please try to create a view.

     

    CREATE VIEW view2 AS

    SELECT

    [DeviceName]

    ,[ContentName]

    ,CAST([EventDateTime] as DATE) as [ActivityDate]

    ,DATEPART(D, [EventDateTime]) as [ActivityDay]

    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]

    ,COUNT(*) as count

    FROM

      [dbo].[ActivityLog]

    GROUP BY

    DeviceName

    ,ContentName

    ,CAST([EventDateTime] as DATE)

    ,DATEPART(D, [EventDateTime])

    ,DATEPART(hh, [EventDateTime])

     

    When you select the view, the results will be based on the latest records in ActivityLog table.

     

    Best Regards,

    Emily


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Thanks Emily, there a couple of reason the view option won't work, as it was my first thought as well.  The log volume is continuously written (aka locked) and the data volume is HUGE, hence the need to physically aggregate to a new table for reporting performance purposes. Much appreciate the reply though!
    Wednesday, October 31, 2018 12:20 PM
  • See if this helps

    ---writes the data
    INSERT INTO AggregatedHrlyActivity
    SELECT
    [DeviceName]
    ,[ContentName]
    ,CAST([EventDateTime] as DATE) as [ActivityDate]
    ,DATEPART(D, [EventDateTime]) as [ActivityDay]
    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]
    ,COUNT(*)
    FROM
      [dbo].[ActivityLog]
    GROUP BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    ORDER BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    GO
    ------updates the existing data

    WITH cte
    AS
    (
    SELECT
    [DeviceName]
    ,[ContentName]
    ,CAST([EventDateTime] as DATE) as [ActivityDate]
    ,DATEPART(D, [EventDateTime]) as [ActivityDay]
    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]
    ,COUNT(*) cnt
    FROM
      [dbo].[ActivityLog]
    GROUP BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 

    )
     UPDATE AggregatedHrlyActivity SET DeviceName=[DeviceName],
     ...........
     FROM AggregatedHrlyActivity JOIN  cte ON 

     AggregatedHrlyActivity.[DeviceName]=cte.[DeviceName] 
     AND 
      
     AggregatedHrlyActivity.[ContentName]=cte.[ContentName] 
     WHERE............


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Uri, thanks for the reply as this is quite interesting!   My only concern is that this data is constantly being added to the log table and the aggregated table is constantly being accessed for reporting purposes.  Will the separate insert and updates allow report users to access "in flight" aggregations?  I need to avoid that, so wasn't sure if I should add some kind of flag field(s) that the report key off of or a temp table or something to minimize this. It seems like you would only want to insert rows where the group by fields do not already exist in the aggregate table, then perform updates on the rows where they already exist, but not sure how best to do this. Thoughts?
    Wednesday, October 31, 2018 12:26 PM
  • Well, you can use MERGE command

    MERGE  AggregatedHrlyActivity as trg
    USING (SELECT
    [DeviceName]
    ,[ContentName]
    ,CAST([EventDateTime] as DATE) as [ActivityDate]
    ,DATEPART(D, [EventDateTime]) as [ActivityDay]
    ,DATEPART(hh, [EventDateTime]) as [ActivityHour]
    ,COUNT(*)
    FROM
      [dbo].[ActivityLog]
    GROUP BY
    DeviceName
    ,ContentName
    ,CAST([EventDateTime] as DATE)
        ,DATEPART(D, [EventDateTime]) 
        ,DATEPART(hh, [EventDateTime]) 
    ) AS src  ON trg.[DeviceName]=src.[DeviceName] 
     AND  trg.[ContentName]=src.[ContentName] 
     WHEN MATCHED
        THEN UPDATE SET  DeviceName=src[DeviceName],
    ................
    WHEN NOT MATCHED
        THEN INSERT VALUES(src.[DeviceName],src... );


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Wednesday, October 31, 2018 12:37 PM
    Moderator
  • That sounds exactly like what I am looking for!  I have read about before (and forgot...), but have not used in a production environment.  Thoughts?  Are those typically performant in high volume?  
    Wednesday, October 31, 2018 3:39 PM
  • Yep, there are some issues , you can read here

    https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

    I have been using it many times, works just fine


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, November 1, 2018 6:13 AM
    Moderator