none
Find date closest to getDate()

    Question

  • All,

    Below is my table

    DECLARE @Test TABLE
    (
       ID INT,
       Date DATETIME
    )
    
    INSERT INTO @Test (ID, Date) VALUES (1, '2019-09-01')
    INSERT INTO @Test (ID, Date) VALUES (2, '2018-09-02')
    INSERT INTO @Test (ID, Date) VALUES (3, '2017-09-04')
    INSERT INTO @Test (ID, Date) VALUES (4, '2019-09-22')
    INSERT INTO @Test (ID, Date) VALUES (5, '2018-09-08')
    INSERT INTO @Test (ID, Date) VALUES (6, '2019-09-15')

    I want to find the date and time that is closest to getdate(). Any help would be greatly appreciated

    Friday, January 11, 2019 6:30 PM

All replies

  • DECLARE @Test TABLE ( ID INT, Date DATETIME ) INSERT INTO @Test (ID, Date) VALUES (1, '2019-09-01') INSERT INTO @Test (ID, Date) VALUES (2, '2018-09-02') INSERT INTO @Test (ID, Date) VALUES (3, '2017-09-04') INSERT INTO @Test (ID, Date) VALUES (4, '2019-09-22') INSERT INTO @Test (ID, Date) VALUES (5, '2018-09-08') INSERT INTO @Test (ID, Date) VALUES (6, '2019-09-15') Select ID, Date from ( Select *, row_number() Over(Order by abs(datediff(minute,[Date],getdate() ))) rn from @Test ) t WHERE rn=1

    --or

    Select ID, Date from (
    Select *, row_number() Over(Order by abs(datediff(day,[Date],getdate() ))) rn 
    from @Test
    ) t
    WHERE rn=1



    Friday, January 11, 2019 7:00 PM
    Moderator
  • Hello Sammy,

    try it with 

    SELECT TOP 1 *
    FROM @Test
    ORDER BY ABS(DATEDIFF(day, GETDATE(), Date))


    Olaf Helper

    [ Blog] [ Xing] [ MVP]

    Friday, January 11, 2019 7:04 PM
  • ;with cte as(
    select date, datediff(dd, getdate(), date) as diff from @Test
    )
    select top 1 date from cte order by abs(diff)
    Saturday, January 12, 2019 9:24 AM
  • All the previous responses can all return you the row that you need.

    Can you be more specific to your question when you say your want to find the closest to getdate().

    - how many rows you need to be returned.

    - does it need to be less than the getdate() value or it doesn't matter if less than or greater than as long as it is close.

    Providing more specific and detailed requirement is required to create the correct SQL Statement. 

    Saturday, January 12, 2019 1:50 PM
  • Good day Sammy,

    The disadvantage of all the solutions above (until this time) is that they cannot use the power of index and I want to assume that you have an index on the columns [Date] which you want to sort the data by it.

    Here is a small trick which bin first glance looks much complex, but we do not care about how complex the query is, but about the performance of the query! and using this trick the Server can use your index and the execution plan will use twice Index Seek (which is the fastest operation) instead of a single Table Scan (which considered the worse case where index is not used at all) or Index Scan (which again need to go over all the index)

    ;With MyCTE1 as (
    	Select TOP 1 [Date], RN = row_number() Over(Order by [Date])  
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date]
    ),
    MyCTE2 as (
    	Select TOP 1 [Date], RN = row_number() Over(Order by [Date] DESC) 
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date] DESC
    ),
    MyCTE3 as(
    	SELECT I = ABS(DATEDIFF(DAY, [Date], GETDATE())) FROM MyCTE1
    	UNION ALL
    	SELECT I = ABS(DATEDIFF(DAY, [Date], GETDATE())) FROM MyCTE2
    )
    SELECT MAX(I) FROM MyCTE3
    GO

     

    Note! As more data you have in the table the differences between my solution and other options will be much more extreme!

    Since in other solutions you will need to scan more data (as they scan the entire index or table) while in my solution you use Index Seek, which pretty much says that we go (almost) directly to the single row (twice) that we need (almost no impact on the number of rows in the table).

     

    Let's examine a simple case and check the execution plan of all options presented so far:

    This is the DDL+DML which I examine:

    DROP TABLE IF EXISTS TestTbl
    GO
    CREATE TABLE TestTbl (
       ID INT,
       Date DATETIME
    )
    GO
    INSERT INTO TestTbl (ID, Date) VALUES 
    	(1, '2019-09-01'),
    	(2, '2018-09-02'),
    	(3, '2017-09-04'),
    	(4, '2019-09-22'),
    	(5, '2018-09-08'),
    	(6, '2019-09-15')
    GO
    
    CREATE NONCLUSTERED INDEX Ix_TestTbl_Date
    	ON TestTbl (Date)
    GO

    And here is the performance comparing according the Execution Plan:

    You can clearly see that all other solutions used 22% resources while my solution uses only 10%

    Here is the Execution of my solution:

    -- Ronen Ariely 1:
    ;With MyCTE1 as (
    	Select TOP 1 [Date], RN = row_number() Over(Order by [Date])  
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date]
    ),
    MyCTE2 as (
    	Select TOP 1 [Date], RN = row_number() Over(Order by [Date] DESC) 
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date] DESC
    ),
    MyCTE3 as(
    	SELECT [Date],I = ABS(DATEDIFF(DAY, [Date], GETDATE())) FROM MyCTE1
    	UNION ALL
    	SELECT [Date],I = ABS(DATEDIFF(DAY, [Date], GETDATE())) FROM MyCTE2
    )
    SELECT TOP 1 [Date]
    FROM MyCTE3
    order by I
    GO

     

    NOTE! If you know that all dates are before today, then in my solution you can even save 50% more of the resources since we will need only a single "Index Seek"


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]
    Saturday, January 12, 2019 2:52 PM
    Moderator
  • Hi Sammy,

    As you said that 'find the date and time that is closest to getdate()', are time intervals measured in minutes ,seconds or other more precise units of time? I think you can choose the units according to your actual situation.

     

    For example, the getdate() is '2019-09-08 00:00:01.000' and you have two rows which are '2019-09-07' and '2019-09-09'. If you would like to query both two rows , please use WINDOW FUNCTION dense_rank()over(). If you only would like to query '2019-09-09' please change the time unit.

     

    Please try following script.

    DECLARE @Test TABLE
    (
       ID INT,
       Date DATETIME
    )
    
    INSERT INTO @Test (ID, Date) VALUES (1, '2019-09-01')
    INSERT INTO @Test (ID, Date) VALUES (2, '2018-09-02')
    INSERT INTO @Test (ID, Date) VALUES (3, '2017-09-04')
    INSERT INTO @Test (ID, Date) VALUES (4, '2019-09-22')
    INSERT INTO @Test (ID, Date) VALUES (5, '2018-09-08')
    INSERT INTO @Test (ID, Date) VALUES (6, '2019-09-15')
    
    SELECT TOP 1 *
    FROM @Test
    ORDER BY ABS(DATEDIFF(mi, GETDATE(), Date))
    /*
    ID          Date
    ----------- -----------------------
    5           2018-09-08 00:00:00.000
    */
    
    -----The time interval between the two records is the same (minute)
    INSERT INTO @Test (ID, Date) VALUES (5, '2019-09-07')
    INSERT INTO @Test (ID, Date) VALUES (6, '2019-09-09')
    declare @date datetime ='2019-09-08 00:00:01.000'
    SELECT *, ABS(DATEDIFF(mi, @date, Date))
    FROM @Test
    ORDER BY ABS(DATEDIFF(mi,@date, Date))
    /*
    ID          Date                    
    ----------- ----------------------- -----------
    5           2019-09-07 00:00:00.000 1440
    6           2019-09-09 00:00:00.000 1440
    6           2019-09-15 00:00:00.000 10080
    1           2019-09-01 00:00:00.000 10080
    4           2019-09-22 00:00:00.000 20160
    5           2018-09-08 00:00:00.000 525600
    2           2018-09-02 00:00:00.000 534240
    3           2017-09-04 00:00:00.000 1056960
    */
    
    Select ID, Date from (
    Select *, dense_rank() Over(Order by abs(datediff(minute,[Date],@date ))) rn 
    from @Test
    ) t
    WHERE rn=1
    /*
    ID          Date
    ----------- -----------------------
    5           2019-09-07 00:00:00.000
    6           2019-09-09 00:00:00.000
    */
    
    -----(second)
    Select ID, Date from (
    Select *, dense_rank() Over(Order by abs(datediff(ss,[Date],@date ))) rn 
    from @Test
    ) t
    WHERE rn=1
    /*
    ID          Date
    ----------- -----------------------
    6           2019-09-09 00:00:00.000
    */
    
    


     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, January 14, 2019 7:19 AM
  • Good day Rachel,

    I HIGHLY recommend to read my solution and explanation. Check your Execution Plan which include "Index Scan" and if you are not sure I am right that "Index Seek" in my solution will provide much better performance then compare IO using several hundred thousand rows or better several hundred million rows :-)

    It is highly recommended to avoid using functions on the column which we want to use during the filter (where) since this prevents the server from using "INDEX SEEK" <= the server must scan all the INDEX (or Table) and check the result of the function before it can choose the right rows

    I start to understand that I need to write a long blog on the topic...

    Here, again a simple comparing of Exaction Plans (above solution is my query and the second is yours):

    -- Ronen Ariely 1:
    ;With MyCTE1 as (
    	Select TOP 1 [Date], RN = row_number() Over(Order by [Date])  
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date]
    ),
    MyCTE2 as (
    	Select TOP 1 [Date], RN = row_number() Over(Order by [Date] DESC) 
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date] DESC
    ),
    MyCTE3 as(
    	SELECT [Date],I = ABS(DATEDIFF(DAY, [Date], GETDATE())) FROM MyCTE1
    	UNION ALL
    	SELECT [Date],I = ABS(DATEDIFF(DAY, [Date], GETDATE())) FROM MyCTE2
    )
    SELECT TOP 1 [Date]
    FROM MyCTE3
    order by I
    GO
    
    -- Rachel_Wang: 
    Select [Date] from (
    Select *, dense_rank() Over(Order by abs(datediff(ss,[Date],GETDATE() ))) rn 
    from TestTbl
    ) t
    WHERE rn=1


    The point is in the filter! In my solution I am using simple filter "WHERE [Date] < GETDATE()", which allow the serer to use "INDEX SEEK", while in other solutions you have a filter like "WHERE rn=1" where the rn is actually a function on the column "dense_rank() Over(Order by abs(datediff(ss,[Date],GETDATE() )))" and this prevent the server from using Seek and it uses Scan


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]




    Monday, January 14, 2019 9:02 AM
    Moderator
  • Ronen,

    What about this:

     --need to adjust for further requirement
    select distinct Date from (
    Select  * ,  (min(abs(datediff(day,[Date],getdate() ))) Over() )  diff,  datediff(day,[Date],getdate() )  flg
    from TestTbl
    cross apply (values( abs(datediff(day,[Date],getdate() )) )  ) d(val)
    ) t
    where diff=val  --and flg>0

    Monday, January 14, 2019 3:03 PM
    Moderator
  • Ronen,

    What about this:

     --need to adjust for further requirement
    select distinct Date from (
    Select  * ,  (min(abs(datediff(day,[Date],getdate() ))) Over() )  diff,  datediff(day,[Date],getdate() )  flg
    from TestTbl
    cross apply (values( abs(datediff(day,[Date],getdate() )) )  ) d(val)
    ) t
    where diff=val  --and flg>0

    AWESOME!!!

    It's the first time in the thread that We Actually Start a Discuss on Interesting Stuff :-)

    Comparing your solution to my is a great (different discussion) which include several aspect in additional to one I mentioned. OK, I will write blog with all the points since there are a lot of points related to these solution and why they behave the way they do. Here I will try to summarize something including the main important points and the final conclusion (hint: IT IS TO USE MY SOLUTION)

    Step 1: Check your Exaction Plan compare to my when we use 6 rows

    WOW... it seems like Jingyang's solution is much better in first glance. According to the Execution Plan it uses only 28% vs 72%

    Well, in first glance we do not see in the query a filter which uses function on the column which we want to use... is this query uses the INDEX well (meaning using INDEX SCAN), and brings another advantage to the story?!? 

    No!in second look you can noticed that we are using filter "where diff=val" and these two values are calculated dynamically.. and we do remember that Ronen told us that using filter on calculated function in the filter will probably prevent using the INDEX well... so... what is the catch here ?

    let's confirm that the Execution Plan uses Index Scan as expected:

    Yes :-(
    This query as expected uses "Index Scan"
    Moreover, we can notice 2 new elements which considered as potentially problematic! "Table Spool"

    So what we have here and can it be that "Index Scan" will uses less resources than "Index Seek"? Well using 6 rows in a table we cannot really estimated how good an INDEX is, since all the rows can be read in a single IO, but what will happen if we will add rows to the table?!?

    INDEX comes to our help when the number of elements is high since it allows the user to find the right target without the need to scan all the values (just like a book's index that gives us the power to go directly to specific page without the need to read all the page each time we search for specific chapter)

    So... We must take it to the next level and add some rows to the table :-)

    Obviously in real case scenario in SQL Server we might deal with Hundreds of millions of records, but for the sake of the forum and for examine this case, I will add only 100k rows. 

    The affect that we are going to see here will be multiplied exponentially when the number of rows will be higher (since we using tree index), but I think that 100k rows will be enough to show the story.

    TRUNCATE TABLE TestTbl
    GO
    INSERT INTO TestTbl (ID, Date)  
    SELECT top 100000
    	ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    	DATEADD(DAY, 0, ABS(CHECKSUM(NEWID()))%100000) d -- random values from 1900-01-01 to 2173-10-16
    FROM sys.all_objects t1
    CROSS JOIN sys.all_objects t2
    GO
    
    SELECT * FROM TestTbl
    GO

     

    Let's test the Execution Plan again?

    In fact, Execution Plan only tell us how the Server is executing the query behind the scene, but the more important parameters which we need to check are the IO, CPU, and TIME!

    NOTE! In a lot of cases Execution Plan shows less resources, while in reality the resources used in the Execution are opposite! The percentages of the "relative to the batch" Execution Plan is not something to count on! The important was the fact that we saw in the Execution Plan that there is a use of "Index Scan" instead of "Index Seek" for example.

    Let's take a look at the numbers and examine the IO:

    SET STATISTICS IO ON
    GO

    My Query returns:

    Table 'TestTbl'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Jingyang solution:

    Table 'Worktable'. Scan count 3, logical reads 202510, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestTbl'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    AND THIS IS THE STORY BEHIND THE SCENES :-)

    OK, there is much more we can discuss here, but I think it is pretty clear that we should try to avoid using functions on the column(s) which we filter, and that in this specific case with the information that we have, we should probably choose to use my solution.


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]


    Monday, January 14, 2019 6:41 PM
    Moderator
  • Ronen,

    Thanks for diving into this question. 

    Here is another version you can compare:

    Select top 1  d1.Date1  
    from TestTbl  
    CROSS APPLY ( Select TOP 1 [Date]  
    	from TestTbl
    	WHERE [Date] < GETDATE()  
    	Order by [Date] DESC)   d1(Date1)
    CROSS APPLY ( Select TOP 1 [Date]  
    	from TestTbl
    	WHERE [Date] > GETDATE() 
    	Order by [Date]  )   d2(Date2)
    Where [Date] > GETDATE()
    Order by [Date]
     
        

    Monday, January 14, 2019 8:03 PM
    Moderator
  • Ronen,

    Thanks for diving into this question. 

    Here is another version you can compare:

    ------------------------------------------ (expect to see one Index Seek)
    Select top 1 d1.Date1 from TestTbl
    ------------------------------------------ Group 1: dates before today (expect to see one Index Seek) CROSS APPLY ( Select TOP 1 [Date] from TestTbl WHERE [Date] < GETDATE() Order by [Date] DESC) d1(Date1)
    ------------------------------------------ Group 2: dates After today(expect to see one Index Seek) CROSS APPLY ( Select TOP 1 [Date] from TestTbl WHERE [Date] > GETDATE() Order by [Date] ) d2(Date2) Where [Date] > GETDATE() Order by [Date]

    Hi Jingyang,

    This query uses my trick which is working on two groups instead of working on the entire SET as one: (1) The rows which have dates less then today, (2) the rows which have dates more then today. 

    This allows us to use simple filter "WHERE [Date] < GETDATE()", and using this filter the server able to use simple Index Seek.

    In your last post you select the data and then you use twice CROSS APPLY to implement the two groups, but by using this approach you will have three Index Seek (the server need to compare each CROSS APPLY with the first select = 3), while in my query we have only two (can be seen in the IO statistics "Scan count 2"). This mean that this query will probably use 3/2 times more IO then in y solution. I did not execute it right now, but I am pretty sure it will show "Scan count 3" and since each IO Scan needed 2 logical reads, I expect to see that this query will use 6 logical reads while my solution used 4.

    So... this is much better then your previous solution (again using the trick to work with two groups of dates is the main point) but it will probably be slightly less then my solution (might even be the same or almost the same percentage in the EP, but the IO should show a bit difference).

    * I am pretty sure that if we will continue, we might find even a solution which is better then the one I published. After all, I wrote my solution in about 10 minutes, while I spent on this thread at least 10 hours to explain it (it is not bad and I will be willing to spend as much time as needed - I just put it in perspective). My main point was the trick to select the rows in two groups which let us use simple filter, which let the server use Index Seek :-)

    At this time, the solution I provided still seems like best option in our case.


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Tuesday, January 15, 2019 7:45 AM
    Moderator
  • Ronen,

    Thanks for your effort to get the here. Your point for the index seek is the key.

    I did revisit this query and come up another one:

      --Thanks for Ronen for the test table mark up
    CREATE TABLE TestTbl (
       ID INT primary key,
       Date DATETIME
    )
    
    
    INSERT INTO TestTbl (ID, Date)  
    SELECT top 100000
    	ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
    	DATEADD(DAY, 0, ABS(CHECKSUM(NEWID()))%100000) d -- random values from 1900-01-01 to 2173-10-16
    FROM sys.all_objects t1
    CROSS JOIN sys.all_objects t2
    
     CREATE NONCLUSTERED INDEX idx_Date
    ON [dbo].[TestTbl] ([Date])
    
    Set statistics io on
    Set statistics time on
    
    
     ---Solution 3
    print 'Jingyang Solution 3'
    Select 
    Case when abs(datediff(dd, getdate(), dt1))<abs(datediff(dd, getdate(), dt2)) 
    then dt1 else dt2 end dt 
    from (Select 1 as n ) src 
    cross apply (
    Select TOP 1 [Date] 
    	from TestTbl
    	WHERE [Date] < GETDATE()
    	Order by [Date] DESC) d1(dt1)
    cross apply (
    	Select TOP 1 [Date]  
    	from TestTbl
    	WHERE [Date] > GETDATE()
    	Order by [Date]) d2(dt2)
    
    	 
    
     drop table TestTbl 
     
     
     
     
    
    Set statistics io off
    Set statistics time off

    Tuesday, January 15, 2019 2:42 PM
    Moderator
  • :-)

    Yes, this solution works awesome using the same IO as in my solution.

    By the way, we need to keep track on the versions for the sake of the discussion and my blog which I will publish on the weekend probably. This is Jingyang 4 :-)

    You had 2 queries in the first response which use Index Scan, another one which used 3 Index Seek, and this one which fix the issue of the extra Index Seek. The reason this solution fixed the previous one is that now it uses "from (Select 1 as n ) src" in the first part of the query, and this part does not need any data so it eliminated the extra Index Seek from solution 3. Well Done :-)

    This solution will use the same IO as in my solution
    Using this DDL+DML both should return something like:

    Table 'TestTbl'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.



    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Tuesday, January 15, 2019 10:41 PM
    Moderator
  • Select * , GetDate(),Abs(DateDiff(hh,Date,Getdate())) from @test
    Order by Abs(DateDiff(hh,Date,Getdate())) 

    mohammad waheed

    Tuesday, January 15, 2019 11:01 PM