none
How do I get the shortest intervals from a table?

    Question

  • I have a table like this:

    CREATE TABLE [dbo].[Logs](

        [logDate] [datetime] NULL,
        [logType] [int] NULL,
        [logCacher] [varchar](128) NULL,
    ) ON [PRIMARY]

     

    Some sample data:

    INSERT INTO Logs VALUES ( '21/11/2007 00:00', 1, 'geocacher1' )
    INSERT INTO Logs VALUES ( '21/11/2007 00:00', 1, 'geocacher1' )
    INSERT INTO Logs VALUES ( '22/11/2007 00:00', 1, 'geocacher1' )
    INSERT INTO Logs VALUES ( '22/11/2007 00:00', 1, 'geocacher1' )
    INSERT INTO Logs VALUES ( '23/11/2007 00:00', 1, 'geocacher1' )

    INSERT INTO Logs VALUES ( '23/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '24/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '25/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '25/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '27/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '29/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '29/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '30/11/2007 00:00', 1, 'geocacher2' )
    INSERT INTO Logs VALUES ( '30/11/2007 00:00', 1, 'geocacher2' )

    INSERT INTO Logs VALUES ( '21/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '21/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '21/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '21/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '25/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '25/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '25/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '25/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '26/11/2007 00:00', 1, 'geocacher3' )
    INSERT INTO Logs VALUES ( '26/11/2007 00:00', 1, 'geocacher3' )
     


    What I want to get is a TOP which gives me the smallest time interval needed for the geocachers to get 5 logs

     

    With the sample data, the result should be something like this:

    geocacher3 - 2 days - (25/11/2007 - 26/11/2007)
    geocacher1 - 3 days - (21/11/2007 - 23/11/2007)
    geocacher2 - 4 days - (27/11/2007 - 30/11/2007)

     

    Is this possible using SQL? I am using SQL Server 2008 Express.

    Friday, October 08, 2010 2:46 PM

Answers

  • ;with cte as
    (
    	select logdate,logcacher,ROW_NUMBER() over (partition by logcacher order by logdate) as rn from Logs 
    ),
    cte1 as
    (
    	select c.logcacher,c.logDate,c1.logDate as NextDate,ROW_NUMBER() over (partition by c.logcacher order by c1.logDate - c.logDate ) as rn from cte c
    	inner join cte c1 on c1.rn = c.rn + 4 and c.logCacher = c1.logCacher
    )
    select * from cte1 where rn = 1	
    
    
    Friday, October 08, 2010 3:04 PM
  • With OrderedDates As
    (Select logDate, logCacher,
     Row_Number() Over(Partition By logCacher Order By logDate) As rn
    From Logs),
    OrderedRanges As
    (Select o1.logCacher, o1.logDate As logDate1, o2.logDate As logDate2,
     Row_Number() Over(Partition By o1.logCacher Order By DateDiff(day, o1.logDate, o2.logDate)) As rn2
    From OrderedDates o1 
    Inner Join OrderedDates o2 On o1.logCacher = o2.logCacher And o1.rn + 4 = o2.rn)
    Select logCacher, logDate1, logDate2, DateDiff(day, logDate1, logDate2) + 1 As NumberDays
    From OrderedRanges
    Where rn2 = 1
    Order By DateDiff(day, logDate1, logDate2), logCacher;
    Tom
    Friday, October 08, 2010 3:21 PM

All replies

  • ;with cte as
    (
    	select logdate,logcacher,ROW_NUMBER() over (partition by logcacher order by logdate) as rn from Logs 
    ),
    cte1 as
    (
    	select c.logcacher,c.logDate,c1.logDate as NextDate,ROW_NUMBER() over (partition by c.logcacher order by c1.logDate - c.logDate ) as rn from cte c
    	inner join cte c1 on c1.rn = c.rn + 4 and c.logCacher = c1.logCacher
    )
    select * from cte1 where rn = 1	
    
    
    Friday, October 08, 2010 3:04 PM
  • With OrderedDates As
    (Select logDate, logCacher,
     Row_Number() Over(Partition By logCacher Order By logDate) As rn
    From Logs),
    OrderedRanges As
    (Select o1.logCacher, o1.logDate As logDate1, o2.logDate As logDate2,
     Row_Number() Over(Partition By o1.logCacher Order By DateDiff(day, o1.logDate, o2.logDate)) As rn2
    From OrderedDates o1 
    Inner Join OrderedDates o2 On o1.logCacher = o2.logCacher And o1.rn + 4 = o2.rn)
    Select logCacher, logDate1, logDate2, DateDiff(day, logDate1, logDate2) + 1 As NumberDays
    From OrderedRanges
    Where rn2 = 1
    Order By DateDiff(day, logDate1, logDate2), logCacher;
    Tom
    Friday, October 08, 2010 3:21 PM
  • Thanks a lot to both! Your answers solved my problem! Didn't know about the row_number over... is this new from sql server?
    Friday, October 08, 2010 3:54 PM
  • Yes, row_number() function was introduced in SQL Server 2005. You may want to read these 3 articles by Kevin Goff starting from this

    New capabilities of SQL Server

    New capabilities - Part 2

    New capabilities - Part 3


    Premature optimization is the root of all evil in programming. (c) by Donald Knuth

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Friday, October 08, 2010 3:56 PM
    Moderator
  • I was trying to accomplish the opposite now, a TOP which gives me the longest time interval needed for the geocachers to get 5 logs... I am having problems with this, as I still didn't fully understood how this row_number thing works... Can someone help me again please? Thanks in advance!

    Monday, October 11, 2010 3:20 PM
  • I was trying to accomplish the opposite now, a TOP which gives me the longest time interval needed for the geocachers to get 5 logs... I am having problems with this, as I still didn't fully understood how this row_number thing works... Can someone help me again please? Thanks in advance!


    Just add a DESCENDING (or DESC) option to the order by in the OVER() clause, so

    With OrderedDates As
    (Select logDate, logCacher,
     Row_Number() Over(Partition By logCacher Order By logDate) As rn
    From Logs),
    OrderedRanges As
    (Select o1.logCacher, o1.logDate As logDate1, o2.logDate As logDate2,
     Row_Number() Over(Partition By o1.logCacher Order By DateDiff(day, o1.logDate, o2.logDate) Desc) As rn2
    From OrderedDates o1 
    Inner Join OrderedDates o2 On o1.logCacher = o2.logCacher And o1.rn + 4 = o2.rn)
    Select logCacher, logDate1, logDate2, DateDiff(day, logDate1, logDate2) + 1 As NumberDays
    From OrderedRanges
    Where rn2 = 1
    Order By DateDiff(day, logDate1, logDate2), logCacher;
    
    Tom

    Monday, October 11, 2010 3:46 PM
  • that was it... I think I've already tried that, but I must have missed something, as it didn't gave me the correct results, as with your query now! Thanks once again!
    Monday, October 11, 2010 4:10 PM
  • Here is another way. 

    CREATE TABLE Logs 
    (log_date DATE DEFAULT CURRENT_dateSTAMP NOT NULL,
     log_type INTEGER NOT NULL,
     log_cacher VARCHAR(128) NULL);

     WITH DailyCounts (log_cacher, log_date, log_cnt)
    AS
    (SELECT log_cacher, in_log_date, COUNT(*) 
      FROM Logs
     GROUP BY log_cacher, in_log_date),
     
     RangeCounts (log_cacher, in_log_date, out_log_date, range_size, range_log_cnt) 
     AS
     (SELECT D1.log_cacher, D1.log_date, D2.log_date, DATEDIFF(DD, D1.log_date, D2.log_date),
            SUM(D3.log_cnt)
       FROM DailyCounts AS D1, DailyCounts AS D2, DailyCounts AS D3
      WHERE D1.log_cacher = D2.log_cacher
        AND D1.log_cacher = D3.log_cacher
         AND D3.log_date BETWEEN D1.log_date AND D2.log_date
     GROUP BY D1.log_cacher, D1.log_date, D2.log_date
     HAVING SUM(D3.log_cnt) >= 5),
     
     RangeSizes (log_cacher, in_log_date, out_log_date, range_size, range_log_cnt)
     AS
     (SELECT log_cacher, in_log_date, out_log_date, range_size, range_log_cnt
       FROM RangeCounts AS C1
      WHERE range_size
            = (SELECT MIN(C2.range_size) 
                 FROM RangeCounts AS C2
                WHERE C1.log_cacher = C2.log_cacher))
                
    SELECT log_cacher, in_log_date, out_log_date, range_size
      FROM RangeSizes;
     


    --CELKO-- Please post DDL, use ISO-11179 naming rules and format code so we can read it
    Monday, October 11, 2010 10:39 PM