Date SQL Puzzle - Consolidate date ranges

已答覆 Date SQL Puzzle - Consolidate date ranges

  • Tuesday, September 22, 2009 2:51 PM
     
     
    Hi,

    I've been thinking about this issue quite some time now and I can't seem to get my head around it. I have multiple sources that provide me date ranges, and these ranges will or will not overlap, some will be consecutive to another, others will not. To make my datawarehouse loading process the most efficient, I would like to consolidate all these ranges into the most simple representation of date ranges possible from the source data. Let me explain using an example:

    Say I have the following query results from several sources of date ranges (Date ranges are considered inclusive):
    ID  FromDate    ToDate
    1    20090801    20090803
    2    20090802    20090809
    3    20090805    20090806
    4    20090812    20090813
    5    20090811    20090812
    6    20090802    20090802

    What I would want is a query that will result in the following ranges:
    ID  FromDate    ToDate
    1    20090801    20090809
    2    20090811    20090813


    I don't really care how these ranges are calculated, although I do have some restictions:
    - I cannot afford to use a temporary table, nor a cursor or stored procedure
    - I would prefer not to use recursion as usually you would like to limit the recursion depth, and this would limit the complexity of date-range combinations the query could handle.
    - I would prefer not to use heavy calculations as the date-range sets that would be fed to the query are quite large. (Up to approx. 50.000 ranges at a time)

     I've thought up a solution, although it seems cumbersome:
    - Select all items as a subselect, splitting them in 1 day chunks
    - Select a distinct startdate, sort on startdate, while issueing a rownumber as a subselect
    - Join the same result twice, but with T1.rownumber = T1.rownumber + 1 for a comparison between the row an it's following row
    - Calculate the date-distance in days, all items with distance greater than 1 day are starting points for a result row
    - Select the starting points from the previous results, and select all rows between this and the next starting point, calculate a min and max date on this set

    As I said...cumbersome.

    Is there any way to consolidate these date ranges on a fairly easy way? Thanks for your effort in advance!


All Replies

  • Tuesday, September 22, 2009 3:29 PM
     
     Answered

    SET NOCOUNT ON

    DECLARE @T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)

    INSERT INTO @T(ID,FromDate,ToDate)
    SELECT 1,'20090801','20090803' UNION ALL
    SELECT 2,'20090802','20090809' UNION ALL
    SELECT 3,'20090805','20090806' UNION ALL
    SELECT 4,'20090812','20090813' UNION ALL
    SELECT 5,'20090811','20090812' UNION ALL
    SELECT 6,'20090802','20090802'


    SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
           s1.FromDate,
           MIN(t1.ToDate) AS ToDate
    FROM @T s1
    INNER JOIN @T t1 ON s1.FromDate <= t1.ToDate
      AND NOT EXISTS(SELECT * FROM @T t2
                     WHERE t1.ToDate >= t2.FromDate
                       AND t1.ToDate < t2.ToDate)
    WHERE NOT EXISTS(SELECT * FROM @T s2
                     WHERE s1.FromDate > s2.FromDate
                       AND s1.FromDate <= s2.ToDate)
    GROUP BY s1.FromDate
    ORDER BY s1.FromDate

    • Marked As Answer by tss68nl Tuesday, September 22, 2009 8:46 PM
    •