none
Building a Time Machine for Data RRS feed

  • Question

  • Hi All,

    I'm looking for some general direction on a query that I need to build for a report that I have been asked to write.

    The report is basedon oilfield data relating to pumps that get pump into wells although the general concept could be applied to many other things.

    I have been asked to write a line graph that shows the average time an oil pump was installed in a well.

    The General data structure looks something like this

    CustomerID PumpID WellName InstallDate PullDate Runtime
    ABC123 1 A 5/15/2018 8/21/2019 463
    ABC123 2 B 6/21/2019 8/21/2019 61
    ABC123 1 B 8/22/2019
    ABC123 2 A 8/22/2019

    The average must include the runtime of pumps that have been pulled from the well in addition to the runtime of the pumps current installed in the well but as you can see, the runtime is not stored in the database table until the pump has has been removed from the well.

    As soon as you start to plot data points on the graph going backwards in time, you have to completely re-evaluate all of the data rows because the data is time sensitive and pumps that are now pulled were not pulled in the previous month and would have different runtimes at that point in time.

    So does anyone have any adcie on how I should go about this?

     


    Friday, August 23, 2019 3:24 PM

Answers

  • Then you want to use a calendar table and recalculate the runtime.

    https://weblogs.sqlteam.com/dang/2010/07/19/calendar-table-and-datetime-functions/

    DECLARE @pumpdata TABLE (CustomerID VARCHAR(100), PumpID INT, WellName VARCHAR(100), InstallDate DATE, PullDate DATE, RunTime INT);
    
    INSERT INTO @pumpdata VALUES
    ('ABC123',1,'A','2018-05-15','2019-08-21',463),
    ('ABC123',2,'B','2018-06-21','2019-08-21',61),
    ('ABC123',1,'B','2019-08-22',NULL,NULL),
    ('ABC123',2,'A','2019-08-22',NULL,NULL);
    
    SELECT 
        c.CalendarDate,
        pd.CustomerID,
        pd.PumpID,
        SUM(DATEDIFF(DAY, InstallDate, c.CalendarDate)) as RunTime
    
    FROM dbo.Calendar c
        inner join @pumpdata pd
        on c.CalendarDate between pd.InstallDate and coalesce(pd.PullDate,GETDATE())
    GROUP BY pd.CustomerID, pd.PumpID, c.CalendarDate
    ORDER BY pd.CustomerID, pd.PumpID, c.CalendarDate;
    

    Friday, August 23, 2019 6:53 PM
    Moderator

All replies

  • DECLARE @pumpdata TABLE (CustomerID VARCHAR(100), PumpID INT, WellName VARCHAR(100), InstallDate DATE, PullDate DATE, RunTime INT);
    
    INSERT INTO @pumpdata VALUES
    ('ABC123',1,'A','2018-05-15','2019-08-21',463),
    ('ABC123',2,'B','2018-06-21','2019-08-21',61),
    ('ABC123',1,'B','2019-08-22',NULL,NULL),
    ('ABC123',2,'A','2019-08-22',NULL,NULL);
    
    SELECT 
        *,
        CASE WHEN RunTime IS NULL THEN DATEDIFF(DAY, InstallDate, GETDATE()) ELSE RunTime END as RunTimeCalc
    
    FROM @pumpdata
    

    Friday, August 23, 2019 4:35 PM
    Moderator
  • Tom,

    Thanks a lot for helping me out but here is the issue.  The results only Show the RunTimeCalc for the current date, it doesn't retroactively go back and calculate those same records for July 2019, June 2019 May 2019 etc.

    That's the problem I'm having, I have to recreate the average runtime as if I had run the query in those months.

    The purpose of the graph is to show our cuystomers that buy using our company, their pumps are lasting longer in the well before they fail.

    Generally, we should see an upward trend.

    We have to show the graph for all pumps that were pulled in that month and pumps still running in the well for that month (based on the install and pull date information)

    Friday, August 23, 2019 6:35 PM
  • Then you want to use a calendar table and recalculate the runtime.

    https://weblogs.sqlteam.com/dang/2010/07/19/calendar-table-and-datetime-functions/

    DECLARE @pumpdata TABLE (CustomerID VARCHAR(100), PumpID INT, WellName VARCHAR(100), InstallDate DATE, PullDate DATE, RunTime INT);
    
    INSERT INTO @pumpdata VALUES
    ('ABC123',1,'A','2018-05-15','2019-08-21',463),
    ('ABC123',2,'B','2018-06-21','2019-08-21',61),
    ('ABC123',1,'B','2019-08-22',NULL,NULL),
    ('ABC123',2,'A','2019-08-22',NULL,NULL);
    
    SELECT 
        c.CalendarDate,
        pd.CustomerID,
        pd.PumpID,
        SUM(DATEDIFF(DAY, InstallDate, c.CalendarDate)) as RunTime
    
    FROM dbo.Calendar c
        inner join @pumpdata pd
        on c.CalendarDate between pd.InstallDate and coalesce(pd.PullDate,GETDATE())
    GROUP BY pd.CustomerID, pd.PumpID, c.CalendarDate
    ORDER BY pd.CustomerID, pd.PumpID, c.CalendarDate;
    

    Friday, August 23, 2019 6:53 PM
    Moderator
  • Awesome thank you.

    I hadn't thought about using a cleanar table approach but I see how this would work now.

    Friday, August 23, 2019 8:28 PM
  • We are glad to hear that they are helpful to you. If you have solved your issue ,please kindly mark the helpful replies as answers. By doing so, it will benefit all community members who are having this similar issue.  Your contribution is highly appreciated.

    Sabrina


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, August 27, 2019 1:22 AM