none
OPTION FAST 'n' , how does it work?

    Question

  • OK I'm reading the docs on this but something doesn't make sense.  I know this makes the server return the first n rows as fast as possible.  But why wouldn't it always try to do this?  What specifically is it doing differently if you use this hint?  Thx adv.
    Thursday, June 11, 2009 12:01 PM

Answers

  • That's not a very good example.  Retrieving the first few sorted records via index is much faster than sorting 1M rows, especially since you can't return **any** rows in sorted order until you've completed the entire sort.

    Here's a real example.  Assume you're joining two tables.  A Hashed Join is faster overall than a Nested Loop Join -- but a Hash Join can't return *anything* until the entire hash table is built.  Once that table is built, the incremental cost of returning rows is very low...but if the user isn't going to look at most of those rows anyway, you can't amortize out the cost of that table build.  In this case, using the Nested Loop join is faster at returning the first few rows, since it can begin returning immediately.
    Michael Asher
    • Marked as answer by JoeDamon Thursday, June 11, 2009 12:37 PM
    Thursday, June 11, 2009 12:11 PM
  • When you use OPTION(FAST n) the query optimizer may pick a plan that is
    targeting to return a few rows quickly, rather than a plan that will
    minimize the total result set retrieval. There are cases where the first
    few rows may come back fast but the whole result set will be slower,
    compared to using a plan that targets optimal plan for the full result set.

    Here is one example:
    http://blogs.msdn.com/queryoptteam/archive/2006/03/30/564912.aspx

    --
    Plamen Ratchev
    http://www.SQLStudio.com
    • Marked as answer by JoeDamon Thursday, June 11, 2009 12:37 PM
    Thursday, June 11, 2009 12:30 PM

All replies

  • Using the hints doesn't *neccesarily* mean anything different.

    But imagine you have ORDER BY col, and that you have a non-clustered index on col. Also, say that this non-clustered index do not cover your query. If SQL Server were to use that index to drive the sort, then SQL server read the first row in the index and then visit the datapage, return the row to the user. Then the same thing for the next row. and the next. Etc. Imagine 1 mill rows. That is 1 mill page accesses.

    Now, it is probably smarter to first sort all rows in some temp structure and then read that temp structure to return the rows - now in sorted order. But SQL Server will not be able to return any rows until *all* rows have been sorted.
    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Thursday, June 11, 2009 12:05 PM
  • That's not a very good example.  Retrieving the first few sorted records via index is much faster than sorting 1M rows, especially since you can't return **any** rows in sorted order until you've completed the entire sort.

    Here's a real example.  Assume you're joining two tables.  A Hashed Join is faster overall than a Nested Loop Join -- but a Hash Join can't return *anything* until the entire hash table is built.  Once that table is built, the incremental cost of returning rows is very low...but if the user isn't going to look at most of those rows anyway, you can't amortize out the cost of that table build.  In this case, using the Nested Loop join is faster at returning the first few rows, since it can begin returning immediately.
    Michael Asher
    • Marked as answer by JoeDamon Thursday, June 11, 2009 12:37 PM
    Thursday, June 11, 2009 12:11 PM
  • When you use OPTION(FAST n) the query optimizer may pick a plan that is
    targeting to return a few rows quickly, rather than a plan that will
    minimize the total result set retrieval. There are cases where the first
    few rows may come back fast but the whole result set will be slower,
    compared to using a plan that targets optimal plan for the full result set.

    Here is one example:
    http://blogs.msdn.com/queryoptteam/archive/2006/03/30/564912.aspx

    --
    Plamen Ratchev
    http://www.SQLStudio.com
    • Marked as answer by JoeDamon Thursday, June 11, 2009 12:37 PM
    Thursday, June 11, 2009 12:30 PM
  • Makes sense, thanks!
    Thursday, June 11, 2009 12:37 PM
  • <<That's not a very good example.  Retrieving the first few sorted records via index is much faster than sorting 1M rows, especially since you can't return **any** rows in sorted order until you've completed the entire sort.

    Here's a real example. ...>>

    You might want to tone down that nonchalant tone, Michael. My example was real, something I encountered at a customer site.

    Not always can we expect the data to fit in cache, and this was such an example. They had a table which didn't fit in cache and a query which usually was rather quick suddently was very very slow and also slowed down other work on the machine. Some developer had added a FAST hint to the query in belief that it was "always good". They didn't look at anything but response times and when data volumes outgrew cache the logical I/O turned into physical I/O. By only removing that FAST hint the query went from a few minutes to a matter of seconds. I also did other improvements, but that is beside the point. If that FAST hint weren't there in the first place, they wouldn't have had those slowdowns.

    Anyhow, here's an example where the sort aspect by itself can show a significant slowdown when using FAST hint:
    Without FAST:    3 seconds,       766 CPU,      20,206 I/O
    With FAST:     101 seconds,  20,656 CPU,  5,516,521 I/O

    EXEC sp_configure 'max server memory', 500
    RECONFIGURE
    GO
    
    USE master
    GO
    IF DB_ID('TestDb') IS NOT NULL DROP DATABASE TestDb
    GO
    CREATE DATABASE [TestDb] ON  PRIMARY 
    ( NAME = N'TestDb', FILENAME = N'C:\TestDb.mdf' , SIZE = 100MB , MAXSIZE = 200MB , FILEGROWTH = 30MB )
     LOG ON 
    ( NAME = N'TestDb_log', FILENAME = N'C:\TestDb_log.ldf' , SIZE = 200MB , MAXSIZE = 500MB , FILEGROWTH = 20MB )
    GO
    
    USE testDb
    
    CREATE TABLE t(c1 int identity PRIMARY KEY CLUSTERED, c2 int, c3 int, filler char(150))
    
    INSERT INTO t (c2, c3, filler)
    SELECT TOP(800000) 1, 1, 'hi'
    FROM sys.columns AS a 
     CROSS JOIN sys.columns AS b
     CROSS JOIN sys.columns AS c
    
    UPDATE t SET c2 = c1 % 20, c3 = c1 % 200
    
    CREATE NONCLUSTERED INDEX x ON t(c2)
    
    --Size of table and indexes
    EXEC sp_indexinfo t 
    --Can be found at http://www.karaszi.com/SQLServer/util_sp_indexinfo.asp
    
    
    EXEC sp_configure 'max server memory', 50
    RECONFIGURE
    GO
    
    CHECKPOINT
    DBCC DROPCLEANBUFFERS
    GO
    DECLARE @dt datetime SET @dt = GETDATE()
    SELECT * FROM t
    WHERE c3 = 16
    ORDER BY c2
    SELECT DATEDIFF(ms, @dt, GETDATE())
    GO
    
    CHECKPOINT
    DBCC DROPCLEANBUFFERS
    GO
    DECLARE @dt datetime SET @dt = GETDATE()
    SELECT * FROM t
    WHERE c3 = 16
    ORDER BY c2
    OPTION(FAST 20)
    SELECT DATEDIFF(ms, @dt, GETDATE())
    
    

    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Friday, June 12, 2009 1:10 PM
  • > "You might want to tone down that nonchalant tone, Michael. My example was real, something I encountered at a customer site"

    Tibor, did you even read my post?  You may have encountered a case where removing the FAST n clause sped up the total query time -- that's not surprising, that's exactly the behaviour one would expect.  But your explanation of the underlying behavior that caused this wasn't correct.  FAST n typically controls the join type, it favors the (slower overall, but quicker to build) nested-loop join, rather than a hashed join.   There's never going to be a case where sorting 1M rows is faster overall than utilizing an index already containing that sort order, whether or not that index is covering.


    Michael Asher
    Friday, June 12, 2009 4:22 PM
  • >FAST n typically controls the join type, it favors the (slower overall, but quicker to build) nested-loop join, rather than a hashed join. Guys, Correct me if I’m wrong… I always thought merge join is the “cheapest”|lightest one and (if possible!) will be used instead of NL.
    • Edited by Newbie x2 Friday, June 12, 2009 4:45 PM sss
    Friday, June 12, 2009 4:44 PM
  • > "Correct me if I’m wrong… I always thought merge join is the “cheapest”|lightest one"

    There is no single join type that is always fastest.  There are even cases where a Nested Loop outperforms other types, say for instance on some 1:1 joins, or when one of the two tables being joined is trivially small and the other is sorted.

    Comparing a merge to a hash, a hash tends to outperform a merge when the two table sizes differ considerably, and the values are unsorted.  Otherwise a merge will probably be faster.

    Complicating the situation is that some join types aren't valid for all join conditions.  For instance, a hash can only process equalities and inequalities in join conditions, a merge can only process equalities, and a NL can process any type of join condition.
    Michael Asher
    Friday, June 12, 2009 5:11 PM
  • Michael,

    >Comparing a merge to a hash, a hash tends to outperform a merge when the two table sizes differ considerably, and the values are unsorted. Otherwise a merge will probably be faster

    My bad – I didn’t specify all preconditions. ;-)

    Indeed comparison is not obvious between

    "sort + merge join" & "hash join" for equalities

     

    I meant “merge against hash join” on sorted streams already.

    I guess, QO will never select hash join in this case, unless we force it with hint (can we?)


    • Edited by Newbie x2 Friday, June 12, 2009 6:59 PM 1
    Friday, June 12, 2009 6:53 PM
  • Something is wrong with IE8... On reply it opens reply windows for every post - then report an error on submit...
    • Edited by Newbie x2 Friday, June 12, 2009 7:00 PM 1
    Friday, June 12, 2009 6:59 PM
  • Test reply, threading seems messed up.
    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Sunday, June 14, 2009 7:51 AM
  • (Need to reply to wrong post because of messed up threading)

    Michael,

    <<Tibor, did you even read my post?>>

    Yes I did. Did you read my? Did you read the TSQL code I posted? Did you run that TSQL code?

    << You may have encountered a case where removing the FAST n clause sped up the total query time -- that's not surprising, that's exactly the behaviour one would expect.  But your explanation of the underlying behavior that caused this wasn't correct.  FAST n typically controls the join type, it favors the (slower overall, but quicker to build) nested-loop join, rather than a hashed join.   There's never going to be a case where sorting 1M rows is faster overall than utilizing an index already containing that sort order, whether or not that index is covering.>>

    Please do run the TSQL code I posted, that is proof, and it also serves as an example of the case I had at my customer site. There's no join involved. There's a nonclustered index on the sort column, non-covering. Having that FAST option cause the query to run in 101 seconds instead of 3 seconds without FAST (where SQL Server does a sort first). I thought that by posting TSQL code I could convince you, not sure what more I can do.


    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Sunday, June 14, 2009 7:53 AM
  • > "Did you read the TSQL code I posted? Did you run that TSQL code?  There's no join involved.

    I did.  Did you look at the plans it generates?   (BTW there are joins involved in your example -- I'll get back to that in a second.)

    Yes SQL code exists that will run slower overall with FAST n.  If the hint is even used, that's the very behavior we would expect.   Nothing surprising there.

    The issue is the explanation of why this happens.  Your original comment was,    "Now, it is probably smarter to first sort all rows in some temp structure and then read that temp structure to return the row "  But is that what's happening in your code?  No.

    In Case 2 (the hint), the nonclustered index is being scanned, and used as left input.  The right input is a clustered index seek (not scan); the two are being JOINed (there's the join) via a Nested Loop, then a final filter stage applied.

    Now what's happening in Case 1?  An index is still being used -- the clustered index.  It's being scanned to return just the results of the where clause -- 4000 out of 800,000 rows.  Then just those rows are being sorted.  All the rows are NOT being sorted into a temp structure, only 1/2 of 1% of them are.   And yes, sorting 4K rows in memory is much faster than a Nested-Loop join involving 800,000 lookups, especially when you're reconfigured Sql Server's memory so the 4K rows fit in memory, but the 800K do not. 

    However, the FAST n hint returns rows faster, as a NL begins streaming immediately, whereas a sort operation does not.  That's why the FAST n hint applies it.

    The important points here are that a) an index is being used in both cases, and b) there will never be a case where its faster to sort an entire table, when a preexisting index exists on that order.  There are, though, plenty of cases where sorting some small fraction of that table may be faster than using the index.

    (BTW, if you remove the where clause from your example, so that the query does have to sort the entire table, you'll see the query with FAST n actually run faster overall than the one without, at least on one of my dev boxes.  On a heavily loaded server that's i/o bound, SQL Server's choice here probably makes more sense)

    Michael Asher
    • Edited by masher2 Sunday, June 14, 2009 3:22 PM
    Sunday, June 14, 2009 3:07 PM
  • Michael,

    I will do my last post here, since I feel that you now are only defending your earlier attack on my example. I started a technical answer explaining how you always see a join in an exec plan when you go from an nc index to fetch a row, etc., but I doubt that carrying this thread forward is productive.

    You attacked an example I had, saying it was "not a very good example", and then provided "a real example". I felt that the tone of your reply was a little condescending so I replied with TSQL code following my earlier example showing the big difference in execution time, resource consumption etc.

    Unless you can show that my TSQL code example is either invalid, or doesn't correspond to the example which you knocked, I'll stop here.

    (And hopefully we can have productive discussions in other threads :-) ).
    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Sunday, June 14, 2009 5:27 PM
  • Tibor,

    I'm sorry if I came across as condescending; transmitting tone accurately over a message board is always problematic.    Again, your code sample is fine, it's the explanation of it that I'm objecting to.    The QO isn't sorting the entire table, but rather only a tiny fraction of it -- and if you remove the where clause so that it does sort the entire table, the hint-based version is actually faster.

    But I suppose we should let slumbering dogs lie at this point :o)
    Michael Asher
    Sunday, June 14, 2009 5:58 PM
  • Michael,

    Ah, I see now where we didn't sync. The number of rows to be returned in the end (whether or not there's a restriction), is an important factor of the puzzle. Yes, indeed. I see now I was rather fuzzy on that part in my first post.

    (Since I now had a script with a bunch of timings, I made a blog post out of it, FWIW... http://sqlblog.com/blogs/tibor_karaszi/archive/2009/06/15/response-time-vs-resource-consumption.aspx )


    <<But I suppose we should let slumbering dogs lie at this point :o)>>

    :-)
    Tibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://sqlblog.com/blogs/tibor_karaszi
    Monday, June 15, 2009 3:42 PM