locked
Calculating Date Difference for Multiple Events per Person RRS feed

  • Question

  • Hi, I would appreciate suggestions on how to approach the following.<o:p></o:p>

    I have two tables of data; table A and Table B. These are snapshots of the data in these tables:

    <o:p></o:p>

    I need to calculate the time difference, in days, between an event that a person participated in (date of event is recorded in Table A) and a follow-up event (recorded in table B) associated with the person’s event from Table A. The dates of the follow-up events are in Table B. <o:p></o:p>

    The time span of the data is Jan 1, 2010 – Dec 31, 2014. During this time, each person participated in one or multiple events (I labeled them e1, e2, e3, etc), and has zero, one, or multiple follow-up/future events. <o:p></o:p>

    The requirement is that for each person, I go sequentially, from the beginning of the time period, Jan 1, 2010, and once I encounter the first event he/she participated in (e1) in table A, look in Table B for a follow-up event that occurred between the e1 date and this person’s next event’s (e2) date, and calculate the time difference (which I will call wait_time) between e1 and the follow-up event(s). <o:p></o:p>

    Then continue with the next event, e2, for the same person, and look in Table B for follow-up events between e2 and e3 (or any time in the future after e2 if there is no e3) and calculate the wait_time between e2 and the follow-up event(s), and so on until his/her last event in the dataset (that would be dated before Dec 31, 2014). Once done calculating the wait time for all of a person’s events, then I need to move on to the next person. <o:p></o:p>

    An event in Table A does not always have a corresponding follow-up event in Table B before a person’s next event’s date; and sometimes there might be several follow-up event dates in Table B in between sequential e events in Table A. If no follow-up between events e1 and e2 (or e3 – e4… etc), then wait_time will be equal to 0. If several follow-ups exist between a person’s two sequential e events, then I will take the minimum time difference as wait_time [wait_time =min(datediff(date1, date2, …)] <o:p></o:p>

    And so on until all persons in the dataset are processed.<o:p></o:p>

    This would be a simple project if for person x there would be only one event, e, and one or multiple follow-up events, as I’d use datediff or min(datediff)<o:p></o:p>

    But as each person in my dataset has one or multiple events, and there may be multiple wait times to calculate per person, I need to figure out how to look for follow-up events in between event dates, and use these follow-up events in the wait_time calculation for an event, and then start over once the next event is reached. And then start over with the next person once all of his/her wait times are calculated. <o:p></o:p>

    I was thinking that perhaps using cursors would make this feasible, but my event tables are large 100,000+ rows each, so using cursors would result in very long processing times…<o:p></o:p>

    Any ideas on how to approach this would be appreciated. Thanks very much!<o:p></o:p>

    Wednesday, December 23, 2015 12:42 PM

Answers

  • You would need to use OUTER APPLY here, e.g. something like:

    ;with PersonEvents as (select t.PersonId, t.EventLabel, t.EventDate as StartDate, ISNULL(t1.EventDate, '30000101') as EndDate 

    from TableA t OUTER APPLY (select top (1) EventDate from TableA t1 where t1.PersonId = t.PersonId and t1.EventDate > t.EventDate order by EventDate) t1)

    select PE.*, datediff(day, PE.StartDate, FE.EventDate) as WaitInDays 

    from PersonEvents PE OUTER APPLY (select top (1) * from TableB t2 where t2.PersonId = PE.PersonID

    and t2.EventDate between PE.StartDate and PE.EndDate order by t2.EventDate -- first following event

    ) FE


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Marked as answer by JSM5683 Wednesday, December 23, 2015 8:10 PM
    Wednesday, December 23, 2015 1:36 PM
  • Based on some limited test date I put together, give this a go:

    DECLARE @tableA TABLE (personID INT, eventDate DATE, eventLabel VARCHAR(10))
    INSERT INTO @tableA (personID, eventDate, eventLabel) VALUES
    (1, '2010-07-17', 'e1'),(1, '2011-04-05', 'e2'),(1, '2012-01-11', 'e3')
    
    DECLARE @tableB TABLE (personID INT, followUpEventDate DATE)
    INSERT INTO @tableB ( personID, followUpEventDate ) VALUES  
    (1, '2010-09-09'),(1, '2012-03-19'),(1, '2012-06-01')
    
    ;WITH baseA AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY personID ORDER BY eventDate) AS seq
      FROM @tableA
    ), baseB AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY personID ORDER BY followUpEventDate) AS seq
      FROM @tableB
    )
    
    SELECT b1.personID, b1.eventDate, b2.eventDate AS nextEventDate, b1.eventLabel, bb.followUpEventDate, COALESCE(DATEDIFF(DAY,b1.eventDate, bb.followUpEventDate),0) AS waitDays
      FROM baseA b1
        LEFT OUTER JOIN baseA b2
    	  ON b1.personID = b2.personID
    	  AND b1.seq + 1 = b2.seq 
    	LEFT OUTER JOIN baseB bb
    	  ON b1.personID = bb.personID
    	  AND bb.seq = (
    	                SELECT MIN(seq) 
    	                   FROM baseB 
    	                  WHERE baseB.followUpEventDate BETWEEN b1.eventDate AND COALESCE(b2.eventDate,CURRENT_TIMESTAMP)
    				   )
    


    Don't forget to mark helpful posts, and answers. It helps others to find relevant posts to the same question. Enjoyed my post? Hit the up arrow (left)
    Really enjoyed it? See my profile!
    My Tech Net Articles.

    • Marked as answer by JSM5683 Wednesday, December 23, 2015 8:10 PM
    Wednesday, December 23, 2015 3:06 PM
  • create table tableA (PersonID int, EventDate date, EventLabel varchar(10))
    
    Insert into tableA values(1,'2010-07-17','e1'),
    (1,'2011-02-05','e2'),
    (1,'2012-01-11','e3'),
    
    (7,'2009-05-13','e1'),
    (7,'2009-09-18','e2'),
    (7,'2010-01-14','e3'),
    (7,'2010-05-07','e4'),
    (7,'2011-02-23','e5'),
    (7,'2012-05-15','e6'),
    (7,'2013-05-08','e7'),
    (7,'2013-06-15','e8'),
    (7,'2013-06-22','e9'),
    (19,'2013-02-12','e1'),
    (20,'2011-12-05','e1'),
    (29,'2009-05-16','e1'),
    (37,'2009-05-11','e1'),
    (37,'2009-05-18','e2'), 
    (61,'2009-07-13','e1'),
    (84,'2013-09-14','e1'),
    (84,'2009-09-24','e2') 
    
    create table tableB (PersonID int, FollowupDate date)
    Insert into tableB values(1,'2010-09-09')
    ,(1,'2012-03-19')
    ,(1,'2012-06-01')
    ,(7,'2009-10-10')
    ,(7,'2009-11-18')
    ,(7,'2010-01-18')
    ,(7,'2011-02-28')
    ,(7,'2013-06-19')
    ,(7,'2013-06-26')
    ,(7,'2013-06-28')
    ,(19,'2013-04-11')
    ,(20,'2012-02-01')
    ,(37,'2009-07-03')
    ,(37,'2009-08-05')
    ,(61,'2009-12-12')
    ,(84,'2009-09-17')
    ,(84,'2009-09-29')
    
    
    ;With mycte as (
    select PersonID, EventDate, EventLabel , 1 as grp from tableA
    Union all
    select PersonID, FollowupDate, 
    'e'+ Cast(row_number() Over(Partition by PersonID Order by FollowupDate) as varchar(10)) +'-followup' as EventLabel 
    , 0 as grp from tableB
    )
    
    ,mycte1 as (
    Select *, sum(grp) Over(Partition by PersonID Order by EventDate) grp2 from mycte )
    
    ,mycte2 as (
    Select PersonID, min(EventDate) Over(Partition by PersonID, grp2) as EventDate
    --, min(EventDate) Over(Partition by PersonID, grp2,grp) minFollowupDate 
    ,datediff(day, min(EventDate) Over(Partition by PersonID, grp2),min(EventDate) Over(Partition by PersonID, grp2,grp)) waitDate
    ,row_number() Over(Partition by PersonID, grp2 Order by grp) rn
    From mycte1
    )
    
    Select PersonID,EventDate, waitDate  from mycte2
    Where rn=1
    order by PersonID,EventDate 
    
    
    
    
    Drop table   tableA, tableB
    

    • Marked as answer by JSM5683 Wednesday, December 23, 2015 8:10 PM
    Wednesday, December 23, 2015 4:42 PM

All replies

  • Hi,

    What I would do is join table a to table b and then join table a (we'll call it table c) again.  Select the minimum date from table b where the date is greater than table a and where not exists a smaller date in table c.  That would ensure you're not using a follow up date for a different event.  Let me know if that makes sense or if you have any questions.

    Wednesday, December 23, 2015 12:54 PM
  • You would need to use OUTER APPLY here, e.g. something like:

    ;with PersonEvents as (select t.PersonId, t.EventLabel, t.EventDate as StartDate, ISNULL(t1.EventDate, '30000101') as EndDate 

    from TableA t OUTER APPLY (select top (1) EventDate from TableA t1 where t1.PersonId = t.PersonId and t1.EventDate > t.EventDate order by EventDate) t1)

    select PE.*, datediff(day, PE.StartDate, FE.EventDate) as WaitInDays 

    from PersonEvents PE OUTER APPLY (select top (1) * from TableB t2 where t2.PersonId = PE.PersonID

    and t2.EventDate between PE.StartDate and PE.EndDate order by t2.EventDate -- first following event

    ) FE


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Marked as answer by JSM5683 Wednesday, December 23, 2015 8:10 PM
    Wednesday, December 23, 2015 1:36 PM
  • So for each event (table A) you need to know 2 facts.  One is obvious - the event date.  The other requires a join of some time - the next event <after> the current event's event date.  Once you have that, the joining to table B is trivial. I will point out an assumption in your description - something you should verify before you go further.  An event will have no more than one follow-up event.  Can an event have no follow-up event? 

    And one last suggestion.  Posting your data as an image does you and everyone else a disservice. Post your schema and data as a script that everyone can run and experiment with. And don't try to make your text "beautiful" - posting will generally mangle anything you attempt.  Pasting any content from a MS product will generally cause odd things to appear - as it seems to be the case here.  

    Wednesday, December 23, 2015 1:50 PM
  • If your provide proper example data and DDL I'll take a look at this.

    Images are completely useless to us. Here's an example of how to post it:

    DECLARE @tableA TABLE (personID INT, eventDate DATE, eventLabel VARCHAR(10))
    INSERT INTO @tableA (personID, eventDate, eventLabel) VALUES
    (1, '2010-07-17', 'e1'),(1, '2011-04-05', 'e2')
    You don't have to use table variables, feel free to use #temp tables, or regular tables. I prefer table variables because there's no clean up required :)


    Don't forget to mark helpful posts, and answers. It helps others to find relevant posts to the same question. Enjoyed my post? Hit the up arrow (left)
    Really enjoyed it? See my profile!
    My Tech Net Articles.

    Wednesday, December 23, 2015 2:52 PM
  • Thanks so much everyone for the suggestions. I will try them when putting together the code for this solution. Sorry about the odd characters in the text I posted - I did paste from Word on Mac and that's what probably caused it. Yes, posting the data as an image is probably not a good idea; will remember next time. (I am new to this forum). scott-morris-ga, to answer your question, yes, an event can have no follow up event. I will have to account for that as well in the code.

    Thanks again!

    Wednesday, December 23, 2015 2:52 PM
  • Based on some limited test date I put together, give this a go:

    DECLARE @tableA TABLE (personID INT, eventDate DATE, eventLabel VARCHAR(10))
    INSERT INTO @tableA (personID, eventDate, eventLabel) VALUES
    (1, '2010-07-17', 'e1'),(1, '2011-04-05', 'e2'),(1, '2012-01-11', 'e3')
    
    DECLARE @tableB TABLE (personID INT, followUpEventDate DATE)
    INSERT INTO @tableB ( personID, followUpEventDate ) VALUES  
    (1, '2010-09-09'),(1, '2012-03-19'),(1, '2012-06-01')
    
    ;WITH baseA AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY personID ORDER BY eventDate) AS seq
      FROM @tableA
    ), baseB AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY personID ORDER BY followUpEventDate) AS seq
      FROM @tableB
    )
    
    SELECT b1.personID, b1.eventDate, b2.eventDate AS nextEventDate, b1.eventLabel, bb.followUpEventDate, COALESCE(DATEDIFF(DAY,b1.eventDate, bb.followUpEventDate),0) AS waitDays
      FROM baseA b1
        LEFT OUTER JOIN baseA b2
    	  ON b1.personID = b2.personID
    	  AND b1.seq + 1 = b2.seq 
    	LEFT OUTER JOIN baseB bb
    	  ON b1.personID = bb.personID
    	  AND bb.seq = (
    	                SELECT MIN(seq) 
    	                   FROM baseB 
    	                  WHERE baseB.followUpEventDate BETWEEN b1.eventDate AND COALESCE(b2.eventDate,CURRENT_TIMESTAMP)
    				   )
    


    Don't forget to mark helpful posts, and answers. It helps others to find relevant posts to the same question. Enjoyed my post? Hit the up arrow (left)
    Really enjoyed it? See my profile!
    My Tech Net Articles.

    • Marked as answer by JSM5683 Wednesday, December 23, 2015 8:10 PM
    Wednesday, December 23, 2015 3:06 PM
  • create table tableA (PersonID int, EventDate date, EventLabel varchar(10))
    
    Insert into tableA values(1,'2010-07-17','e1'),
    (1,'2011-02-05','e2'),
    (1,'2012-01-11','e3'),
    
    (7,'2009-05-13','e1'),
    (7,'2009-09-18','e2'),
    (7,'2010-01-14','e3'),
    (7,'2010-05-07','e4'),
    (7,'2011-02-23','e5'),
    (7,'2012-05-15','e6'),
    (7,'2013-05-08','e7'),
    (7,'2013-06-15','e8'),
    (7,'2013-06-22','e9'),
    (19,'2013-02-12','e1'),
    (20,'2011-12-05','e1'),
    (29,'2009-05-16','e1'),
    (37,'2009-05-11','e1'),
    (37,'2009-05-18','e2'), 
    (61,'2009-07-13','e1'),
    (84,'2013-09-14','e1'),
    (84,'2009-09-24','e2') 
    
    create table tableB (PersonID int, FollowupDate date)
    Insert into tableB values(1,'2010-09-09')
    ,(1,'2012-03-19')
    ,(1,'2012-06-01')
    ,(7,'2009-10-10')
    ,(7,'2009-11-18')
    ,(7,'2010-01-18')
    ,(7,'2011-02-28')
    ,(7,'2013-06-19')
    ,(7,'2013-06-26')
    ,(7,'2013-06-28')
    ,(19,'2013-04-11')
    ,(20,'2012-02-01')
    ,(37,'2009-07-03')
    ,(37,'2009-08-05')
    ,(61,'2009-12-12')
    ,(84,'2009-09-17')
    ,(84,'2009-09-29')
    
    
    ;With mycte as (
    select PersonID, EventDate, EventLabel , 1 as grp from tableA
    Union all
    select PersonID, FollowupDate, 
    'e'+ Cast(row_number() Over(Partition by PersonID Order by FollowupDate) as varchar(10)) +'-followup' as EventLabel 
    , 0 as grp from tableB
    )
    
    ,mycte1 as (
    Select *, sum(grp) Over(Partition by PersonID Order by EventDate) grp2 from mycte )
    
    ,mycte2 as (
    Select PersonID, min(EventDate) Over(Partition by PersonID, grp2) as EventDate
    --, min(EventDate) Over(Partition by PersonID, grp2,grp) minFollowupDate 
    ,datediff(day, min(EventDate) Over(Partition by PersonID, grp2),min(EventDate) Over(Partition by PersonID, grp2,grp)) waitDate
    ,row_number() Over(Partition by PersonID, grp2 Order by grp) rn
    From mycte1
    )
    
    Select PersonID,EventDate, waitDate  from mycte2
    Where rn=1
    order by PersonID,EventDate 
    
    
    
    
    Drop table   tableA, tableB
    

    • Marked as answer by JSM5683 Wednesday, December 23, 2015 8:10 PM
    Wednesday, December 23, 2015 4:42 PM
  • Hi,

    Using ROW_NUMBER and the PARTITION BY clause worked for me. Not too familiar with OUTER APPLY - will need to do some reading on this to learn more about it, but a solution using OUTER APPLY would have probably worked as well. So I'll mark all 3 responses as answers.

    Thanks!  This helped tremendously. Happy holidays!

    Wednesday, December 23, 2015 8:10 PM
  • Please follow basic Netiquette and post the DDL we need to answer this. Follow industry and ANSI/ISO standards in your data. You should follow ISO-11179 rules for naming data elements. You should follow ISO-8601 rules for displaying temporal data (https://xkcd.com/1179/). Why are you using Oracle dialect here? We need to know the data types, keys and constraints on the table. Avoid dialect in favor of ANSI/ISO Standard SQL. 

    I really have come to hate being the typist for a poster on SQL forums. I am an old man with twisted fingers, bad eyes and a prostate  problem; why am I expected to provide free labor for the under 70 crowd? :( 

    >>  I need to calculate the time difference, in days, between an event that a person participated in (date of event is recorded in Table A) and a follow-up event (recorded in table B) associated with the person’s event from Table A. The dates of the follow-up events are in Table B. <<

    NO! This is a serious design error called attribute splitting. An event is a single entity in a data model. It has state changes, shown by status attributes. If you saw a schema with “Male_Personnel” and “Female_Personnel” tables, you would know you wanted a “Personnel” table with a sex_code column in it. 

    >> The time span of the data is 2010-01-01 thru 2014-12–31. During this time, each person participated in one or multiple events (I labeled them e1, e2, e3, etc), and has zero, one, or multiple follow-up/future events. <<

    Where is the Events table? It should have the series of events, shown with (prior_event_id, current_event_id) pairs for “control flow”and the  (prior_event_date, current_event_date) dates. Read an old article of mine at:

    https://www.simple-talk.com/sql/t-sql-programming/state-transition-constraints/

    Your mindset is still back in paper forms, and not RDBMS. Your (poorly named even for a skeleton) tables are what would be captured on a clip board paper form. But that is not RDBMS. 

    >> The requirement is that for each person, I go sequentially, from the beginning of the time period, 2010-01-01, and once I encounter the first event he/she participated in (e1) in table A, look in Table B for a follow-up event that occurred between the e1 date and this person’s next event’s (e2) date, and calculate the time difference (which I will call wait_time) between e1 and the follow-up event(s). <<

    ARRRGH! SQL is a set based language and it is declarative. We have no concept of looping or iteration. Saying that you are going to process people as if they were on magnetic tape is like going to a vegan restaurant and ordering a fried baby. So wrong, so many ways.

    >> I was thinking that perhaps using cursors would make this feasible, but my event tables are large 100,000+ rows each, so using cursors would result in very long processing times… <<

    Cursors are literally how you mimic an IBM 1960s mad tape filesystem in SQL. All of the commands for cursors were taken directly from the old IBM tape system utilities. 

    Also 100,000 rows is nothing. Today we think in terms of terabytes for small systems. Now if you get a petabyte or a yattobyre of data. Then we will worry about it :-)

    Playing with the article I gave you then follow the forum rules and post some DDL. Once we have that we can do some design work with your schema and write code..

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Wednesday, December 23, 2015 9:10 PM