locked
Row estimations number is miscalculated when using dateadd ( sysdatetime() ) as predicate RRS feed

  • Question

  • Hi everyone,

    I came across this TSQL riddle and couldn't find a conclusive answer.

    Here goes: (Generate Data script)

     

    Use TempDB 
    GO
    
    CREATE TABLE
     Orders
     (
     Id   	INT    NOT NULL IDENTITY(1,1) ,
     DateAndTime 	DATETIME2(7)	NOT NULL ,
     OrderStatusId TINYINT  	NOT NULL ,
    CONSTRAINT
     pk_Orders_nc_Id PRIMARY KEY NONCLUSTERED
     (Id ASC)
     )
     GO
    CREATE NONCLUSTERED INDEX
     ix_Orders_nc_nu_DateAndTime
     ON
     Orders (DateAndTime ASC);
     GO
    INSERT INTO
     Orders WITH (TABLOCK)
     (
     DateAndTime,
     OrderStatusId
     )
     SELECT TOP (10000)
     DateAndTime 	= DATEADD (MINUTE , - ABS (CHECKSUM (NEWID ())) % (60 * 24 * 365 * 5) , SYSDATETIME ()),
     OrderStatusId = ABS (CHECKSUM (NEWID ())) % 8 + 1
     FROM
     sys.all_columns t1
     CROSS JOIN
     sys.all_columns t2
     ORDER BY
     NEWID () ASC;
     GO
    
    
    

     


    Now run these two statements and have the actual execution plan displayed.

     

    SELECT
     Orders.Id ,
     Orders.DateAndTime ,
     Orders.OrderStatusId
     FROM
    Orders
     WHERE
     Orders.DateAndTime > DATEADD (YEAR , -2 , SYSDATETIME ());
     GO
    
    SELECT
     Orders.Id ,
     Orders.DateAndTime ,
     Orders.OrderStatusId
     FROM
    Orders
     WHERE
     Orders.DateAndTime > DATEADD (YEAR , -2 , GETDATE ());
     GO
    

     

    As you can notice, the optimizer estimates the first statement to return only one row while the second statement is estimated more accurately.

    question is, why is this difference ? why only 1 row for the first query ?

    (BTW, this code ran on a SQL Server 2008 (RTM) - 10.0.1600.22 (Build 7601: Service Pack 1) )

     

    I've read a few opinions on the matter ("stackoverflow" forum and such) 

    http://stackoverflow.com/questions/4679563/how-that-happen-sp-sql-server

    and found a few bugs on MS connect

    https://connect.microsoft.com/SQLServer/feedback/details/630583/incorrect-estimate-with-condition-that-includes-datediff

    that resemble this issue, but no conclusive reason for this specific scenario.

    The prize for solving this was a free TSQL course, but since that course is starting tomorrow, I'll settle with knowing the answer.

     


    • Edited by John - G Friday, August 26, 2011 8:10 PM Added more details about testing environment
    Friday, August 26, 2011 1:56 PM

Answers

  • This is indeed a bug. It seems that the optimizer fails to consider the dateadd, but makes its estimate from the value of sysdatetime() alone.

    Here is a repro:

    CREATE TABLE #t (d datetime2(7) NOT NULL PRIMARY KEY)
    go
    INSERT #t (d)
       VALUES(dateadd(YEAR, -5, sysdatetime())),
             (dateadd(YEAR, -4, sysdatetime())),
             (dateadd(YEAR, -3, sysdatetime())),
             (dateadd(YEAR, -2, sysdatetime())),
             (dateadd(YEAR, -1, sysdatetime())),
             (dateadd(YEAR,  0, sysdatetime())),
             (dateadd(YEAR,  1, sysdatetime())),
             (dateadd(YEAR,  2, sysdatetime())),
             (dateadd(YEAR,  3, sysdatetime())),
             (dateadd(YEAR,  4, sysdatetime())),
             (dateadd(YEAR,  5, sysdatetime()))
    go
    SELECT d FROM #t WHERE d > dateadd(YEAR, -10, getdate())
    SELECT d FROM #t WHERE d > dateadd(YEAR, -10, sysdatetime())
    go
    drop table #t

    The estimate for the first query is 11 which is correct, but for the second query it's 6.

    I've reported this as
    https://connect.microsoft.com/SQLServer/feedback/details/685903/incorrect-estimate-when-sysdatetime-appear-in-a-dateadd-expression


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by John - G Saturday, August 27, 2011 7:05 PM
    Saturday, August 27, 2011 9:58 AM

All replies

  • I think the reason may lay in implicit conversion. I haven't tried your script yet, but what will happen if

    SELECT
     Orders.Id ,
     Orders.DateAndTime ,
     Orders.OrderStatusId
     FROM
    Orders
     WHERE
     Orders.DateAndTime > DATEADD (YEAR , -2 , SYSDATETIME ());
     GO
    
    SELECT
     Orders.Id ,
     Orders.DateAndTime ,
     Orders.OrderStatusId
     FROM
    Orders
     WHERE
     Orders.DateAndTime > convert(datetime2(7),DATEADD (YEAR , -2 , GETDATE ()));
     GO
    
    

     


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    Friday, August 26, 2011 2:05 PM
  • Naomi,

    The data type datetime2 has greater precedence than datetime, so I expect the implicit conversion to flow exactly as your explicit one.

     


    AMB

    Some guidelines for posting questions...

    Friday, August 26, 2011 2:20 PM
  • True, but better to be explicit anyway. I suspect the problem may be somewhere else, though. Most likely an internal bug.
    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    Friday, August 26, 2011 2:51 PM
  • John,

    I do not have an answer about where exactly the problem is or why is the QO behaving like this, but I am going to ask in the private group later on and look for help.

    For some reason that I am not aware of, the QO seems not to be using the histogram when we use SYSDATETIME, but it does when we use GETDATE.

     


    AMB

    Some guidelines for posting questions...

    Friday, August 26, 2011 3:00 PM
  • This is what I got as the execution plan (e.g. the first version - no conversion - behaved much better):


     

    The change I suggested made no difference, I still got 16% vs. 84% - index seek and table scan. However, I vaguely remember we did discuss a similar case here several months back. Can you try searching this forum?

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    Friday, August 26, 2011 3:22 PM
  • Tnx for the quick responses, Naomi and hunchback.

    I'll wait for Hunchback's group answer and search the forum just like Naomi suggested.

     

    Friday, August 26, 2011 7:57 PM
  • Hi,

    It seems related to this bug Erland Sommarskog reported:


    That's marked as fixed in Denali, but I see no change in CTP 3 with TF 4199 on.  Perhaps it's a post-CTP 3 fix.  Anyway, a workaround is to put the expression in a variable and use RECOMPILE to allow the exact value to be used for the estimate:
    DECLARE @d DATETIME2(7) = 
    	DATEADD([year], -10, SYSDATETIME())
    
    SELECT 
    	SalesOrderID, 
    	CustomerID, 
    	OrderDate
    FROM #T
    WHERE 
    	OrderDate > @d
    OPTION (RECOMPILE)
    
    
    The 'estimated plan' is hopelessly still wrong of course, but the 'actual plan'  shows much better estimates, if you see what I mean.  I would expect any recompilation that makes the value available would work too (such as putting the thing in a procedure where @d is a parameter, sp_executesql, and so on.
    e.g.
    DECLARE @d DATETIME2(7) = 
    	DATEADD([year], -10, SYSDATETIME())
    
    EXECUTE sp_executesql N'
    	SELECT 
    		SalesOrderID, 
    		CustomerID, 
    		OrderDate
    	FROM #T
    	WHERE 
    		OrderDate > @d
    	OPTION (RECOMPILE)',
    	N'@d DATETIME2(7)',
    	@d = @d
    

    Cheers,
    Paul

    Saturday, August 27, 2011 5:06 AM
  • This is indeed a bug. It seems that the optimizer fails to consider the dateadd, but makes its estimate from the value of sysdatetime() alone.

    Here is a repro:

    CREATE TABLE #t (d datetime2(7) NOT NULL PRIMARY KEY)
    go
    INSERT #t (d)
       VALUES(dateadd(YEAR, -5, sysdatetime())),
             (dateadd(YEAR, -4, sysdatetime())),
             (dateadd(YEAR, -3, sysdatetime())),
             (dateadd(YEAR, -2, sysdatetime())),
             (dateadd(YEAR, -1, sysdatetime())),
             (dateadd(YEAR,  0, sysdatetime())),
             (dateadd(YEAR,  1, sysdatetime())),
             (dateadd(YEAR,  2, sysdatetime())),
             (dateadd(YEAR,  3, sysdatetime())),
             (dateadd(YEAR,  4, sysdatetime())),
             (dateadd(YEAR,  5, sysdatetime()))
    go
    SELECT d FROM #t WHERE d > dateadd(YEAR, -10, getdate())
    SELECT d FROM #t WHERE d > dateadd(YEAR, -10, sysdatetime())
    go
    drop table #t

    The estimate for the first query is 11 which is correct, but for the second query it's 6.

    I've reported this as
    https://connect.microsoft.com/SQLServer/feedback/details/685903/incorrect-estimate-when-sysdatetime-appear-in-a-dateadd-expression


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by John - G Saturday, August 27, 2011 7:05 PM
    Saturday, August 27, 2011 9:58 AM
  • Thank you, Erland, for the quick answer and submitting the bug to MS.

     

    I'd like to point out that this bug was originally found and posted by Noam Brezis with the assistance of Guy Glantser from Madeira Israel and Haim Fishner.

     

    Also, with your help, Erland, I got that free TSQL course. 

     

     

    Sunday, August 28, 2011 8:35 PM