none
Can someone help, TOP 100 is significantly faster than TOP 101

    Question

  • Any help appreciated

    --fast
    select top 100 * from test where c1 < 30000 order by c2

    --slow
    select top 101 * from test where c1 < 30000 order by c2

    In the Webcast 1 (www.sqlworkshops.com/webcasts) they actually show how to make TOP 101 faster in SQL Server 2008, but they hide their solution for SQL Server 2005. Can someone post a way to make TOP 101 faster in SQL Server 2005, this will help me.

    BobB


    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    • Edited by Robert W Beck Monday, December 14, 2009 6:04 AM additional info
    Friday, December 11, 2009 7:13 PM

All replies

  • Think about what it is doing...

    It finds (quickly via an Index Seek) the 30,000 rows (where c1<30000).  But since you specified that you want ALL columns (i.e. SELECT *), it pulls in the two integer columns (c1 and c2) and the 2000-character-long c3 column.  So that's 30,000 rows * 2008 bytes = 6million bytes that it has to (physically) sort.

    According to the documentation for the "Top N Sort" operator (http://technet.microsoft.com/en-us/library/ms189054(SQL.90).aspx), it says that for "small values of N" it does the sort in memory.

    I'm guessing that it determined that 100 is considered a "small value" for the 6million bytes that you needed to sort and 101 was NOT a "small value" and so it had to resort to more a more generic method of sorting that probably involved some disk work.

    If you change your SELECT * to just SELECT c2 then you won't see a difference between 100 and 101.

    It all has to do with the size of the rows... and I guess you found the threshhold.

    That's my guess... unless someone else has some input.

    --Brad (My Blog)
    Friday, December 11, 2009 8:01 PM
  • I would agree with Brad. I looked at the execution plans for both and the only thing that I found different(significantly) is CompileTime and CompileCPU.
    TOP 100  CompileTime="2" CompileCPU="2"
    TOP 101 CompileTime="63" CompileCPU="29"


    Abdallah, PMP, ITIL, MCTS
    Friday, December 11, 2009 8:09 PM
  • One other thing...

    Because of the size of the rows you need to sort... you can re-work the query like so and it will go lightning fast for TOP values over 100:

    SELECT A.c1,A.c2,B.c3
    FROM (SELECT TOP 101 c1,c2 FROM Test WHERE c1<30000 ORDER BY c2) A
    JOIN Test B ON A.c1=B.c1

    This way, it's only sorting small 8-byte rows and not 2008-byte rows.

    --Brad (My Blog)
    Friday, December 11, 2009 8:10 PM
  • Hi
    It's very good point, and I agree that there is some kind of a threshold.
    However when you switch io statistics  on, there is no difference and there is no
    refernce of "Worktables" or any other
    sort of temprorary tables, and the execution plan is exactly the same.
    I also tried to replace top N with top N PERCENT , and then the results (99 and 102, 200, 50 rows)
    come not as fast as with TOP 100 , but not as slow as with 101.
    There is one moreinteresting point - the response time is equal for any number of returning rows
    when you use TOP N PERCENT
    Friday, December 11, 2009 8:13 PM
  • I looked into it a little more, and here's what I concluded.

    The TOP N PERCENT query will sort EVERYTHING and THEN will perform the TOP operation.

    The TOP N query will use the special TOP N SORT operator.

    I'm guessing that the TOP N SORT operator is very efficient for values of N<=100.  For values >100, it is a dog.

    The tradition SORT-only operator is very efficient.  It doesn't take long for it to sort the 30000 records and then feed the results to the TOP operator.

    The following two queries, which both select 101 records, go very fast:

    SELECT TOP 0.336667 PERCENT FROM Test WHERE c1<30000 ORDER BY c2

    SELECT TOP (101.0/(SELECT COUNT(*) FROM Test WHERE c1<30000)*100) PERCENT *
    FROM Test
    WHERE c1<30000
    ORDER BY c2

    But, as we have all found, the following is slow as molasses:

    SELECT TOP 101 FROM Test WHERE c1<30000 ORDER BY c2

    I think that 100 is an arbitrary tipping point for the TOP N SORT operator.  In just doing some testing with 99, 100, and 101, and doing a query on a large table in Adventureworks you can see a difference.

    SELECT TOP N * FROM AdventureWorks.Sales.SalesOrderDetail ORDER BY CarrierTrackingNumber

    For N = 99 or 100, the Reads are 1246 and the Duration is around 500.  Once N=101 or greater, the Reads shoot up to 2240 and the Duration to 1600 or higher.


    --Brad (My Blog)
    Friday, December 11, 2009 8:33 PM
  • If you were looking for the ALL/ANY blog entry, it's here:  http://bradsruminations.blogspot.com/2009/08/all-any-and-some-three-stooges.html

    I try to attach keywords to the blog posts, so you can type "ALL" or "ANY" in the "Search This Blog" Google Search box in the upper-right-hand corner of any blog page.

    You can also go thru the Blog Archive (also on the right-hand side under my "About Me" section) and poke through the various months.


    --Brad (My Blog)
    Friday, December 11, 2009 9:04 PM
  • I found this subject to be really interesting, and I thought I'd blog about it.

    But I figured before I do that...

    I wrote an e-mail to Conor Cunningham (SQL Server Query Optimization Development Lead at MSFT) about this subject and my suspicions posted above.

    He said the person who is responsible for this area of the optimizer is on vacation until January and he'd get back to me then.

    I'll try to follow up here once I find out anything definitive.

    --Brad (My Blog)
    Saturday, December 12, 2009 8:38 PM
  • Robert & All,

    Certainly attention getting thread title!

    Consider this: if we rebuild the index after table population and eliminate buffer reads, it appears there is no significant difference, or at least not as dramatic as without the "equalizers". Can you confirm? Following test is on SQL Server 2008.

    Related link: Measuring Query Execution Time

    Thanks.

    SET NOCOUNT ON
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    
    --fast
    SET STATISTICS IO ON
    DBCC DROPCLEANBUFFERS
    select top 100 * from test where c1 < 30000 order by c2
    /*
    Table 'test'. Scan count 2, logical reads 4086, physical reads 24, 
    read-ahead reads 7503, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
    */
    --slow
    DBCC DROPCLEANBUFFERS
    select top 101 * from test where c1 < 30000 order by c2
    /*
    Table 'test'. Scan count 2, logical reads 4455, physical reads 11, 
    read-ahead reads 7503, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
    */
    SET STATISTICS IO OFF
    GO
    DROP TABLE test

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Saturday, December 12, 2009 9:22 PM
  • hi,
    This is one of good post I have seen.  Needs to thanks Brad b'cos he explain this scenario nicely.I am come up with this solution it’s much better comparing top 101 operation.

    ----------------------------
    SET ROWCOUNT 101
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2
    SET ROWCOUNT 0
    ----------------------------

    Thanks
    Tharindu Dhaneenja

    Tharindu Dhaneenja (http://spaces.msn.com/dhaneenja)
    Sunday, December 13, 2009 3:42 PM
  • Brad - I did a run with STATISTICS TIME ON. Look how much the "fast" top 100 slowed down after index rebuild and dbcc dropcleanbuffers.

    Tharindu - your method is a little bit slower than the TOP function in the following test.
     
    USE tempdb;
    SET NOCOUNT ON
    GO
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    
    SET STATISTICS TIME ON
    select top 100 * from test where c1 < 30000 order by c2 
    --      CPU time = 62 ms,  elapsed time = 85 ms.
    SET STATISTICS TIME OFF
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    GO
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select top 100 * from test where c1 < 30000 order by c2 
    --     CPU time = 110 ms,  elapsed time = 1458 ms.
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select top 101 * from test where c1 < 30000 order by c2 
    --       CPU time = 437 ms,  elapsed time = 2583 ms.
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS
    SET ROWCOUNT 101
    SET STATISTICS TIME ON
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2
    --      CPU time = 469 ms,  elapsed time = 3049 ms.
    SET STATISTICS TIME OFF
    SET ROWCOUNT 0
    GO
    DROP TABLE test

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, December 13, 2009 7:44 PM
  • Naomi/Brad,

    Following Naomi's proposal to select c1 only, Top 100, Top 101 and Set Rowcount basically the same. As expected.

    A good general operating practice for DBA-s to REBUILD indexes every weekend. Otherwise query performance may suffer.

    USE tempdb;
    SET NOCOUNT ON
    GO
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    
    SET STATISTICS TIME ON
    select top 100 c1 from test where c1 < 30000 order by c2 
    --       CPU time = 31 ms,  elapsed time = 33 ms.
    SET STATISTICS TIME OFF
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    GO
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select top 100 c1 from test where c1 < 30000 order by c2 
    --     CPU time = 48 ms,  elapsed time = 1305 ms.
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select top 101 c1 from test where c1 < 30000 order by c2 
    --    CPU time = 31 ms,  elapsed time = 1291 ms.
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS
    SET ROWCOUNT 101
    SET STATISTICS TIME ON
    SELECT c1 FROM TEST WHERE C1 < 30000 ORDER BY C2
    --     CPU time = 0 ms,  elapsed time = 1497 ms.
    SET STATISTICS TIME OFF
    SET ROWCOUNT 0
    GO
    DROP TABLE test

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Sunday, December 13, 2009 8:51 PM
  • I'm going to agree with Brad - it's the result of the Top N Sort operator using TempDB for sorting rather than in memory.

    By adding in calls to fn_virtualfilestats for TempDB on a server with no activity, you can see that the first query has no increase in the BytesWritten to TempDB, whereas the second (slow) query does (about 6 MB for the tempdb.mdf and 600KB for the log). 
    Monday, December 14, 2009 12:14 AM
  • I'm going to agree with Brad - it's the result of the Top N Sort operator using TempDB for sorting rather than in memory.

    By adding in calls to fn_virtualfilestats for TempDB on a server with no activity, you can see that the first query has no increase in the BytesWritten to TempDB, whereas the second (slow) query does (about 6 MB for the tempdb.mdf and 600KB for the log). 
    Agreed.

    If we rebuild the clustered index and cut out the dead-weight 2K column, as suggested by Naomi/Brad, all 3 queries perform the same. Demo follows.
    USE tempdb;
    SET NOCOUNT ON
    GO
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    -- REBUILD PRIMARY KEY clustered index
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    GO
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select top 100 c1 from test where c1 < 30000 order by c2 
    --      CPU time = 47 ms,  elapsed time = 1329 ms.
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select top 101 c1 from test where c1 < 30000 order by c2 
    --     CPU time = 32 ms,  elapsed time = 1282 ms.
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS
    SET ROWCOUNT 101
    SET STATISTICS TIME ON
    SELECT c1 FROM TEST WHERE C1 < 30000 ORDER BY C2
    --      CPU time = 31 ms,  elapsed time = 1313 ms.
    SET STATISTICS TIME OFF
    SET ROWCOUNT 0
    GO
    DROP TABLE test


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 12:26 AM
  • Interesting thread.  My observation is (may be i am worng) it is to do more with ROW SIZE than any number (ie. 100 is the threashold or something like that).  If you make column C3 VARCHAR then even TOP 1000 and TOP 1001 is returning in same time.


    Madhu


    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Monday, December 14, 2009 2:16 AM
  • Using SQL Profiler, this is what I got (with a DROPCLEANBUFFERS and FREEPROCCACHE between each test):

    TOP 100 (Before Index Rebuild):  Reads=7532, CPU=104, Duration=114
    TOP 100 (After Index Rebuild):  Reads=7517, CPU=104, Duration=108
    TOP 101:  Reads=13199, CPU=542, Duration=10769
    ROWCOUNT 101:  Reads=13176, CPU=370, Duration=1341

    Difference before/after index rebuild is negligible.

    Again, note the spike in Reads and CPU and Duration between TOP 100 and TOP 101.  The #Reads alone indicates that something different is happening (a different algorithm being employed I think) between TOP 100 and TOP 101.


    --Brad (My Blog)
    Monday, December 14, 2009 5:39 PM
  • Brad...hmm... Why are my results so different? Can you post the scipts? Thanks.
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 5:48 PM
  • Interesting thread.  My observation is (may be i am worng) it is to do more with ROW SIZE than any number (ie. 100 is the threashold or something like that).  If you make column C3 VARCHAR then even TOP 1000 and TOP 1001 is returning in same time.


    Madhu


    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/

    It's true that if you change the C3 column to a VARCHAR that you won't notice any difference (with your own eyes) in time between doing a TOP 100 or TOP 101.

    But there is something there... If you use SQL Profiler, you will see the following differences:

    TOP 100:  Reads=86, Duration=25ms   (Same measurements occur for TOP 99 and TOP 98 and so on downwards)
    TOP 101:  Reads=165, Duration=55ms  (Same measurements occur for TOP 102 and TOP 103 and so on upwards)

    This indicates to me that a different algorithm is being employed between <=100 and >=101.


    --Brad (My Blog)
    Monday, December 14, 2009 5:51 PM
  • Brad...hmm... Why are my results so different? Can you post the scipts? Thanks.
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com

    Here you go...  Again, the numbers come from Profiler, and I set Query Results to "Discard" so there is no time taken up by outputting the results to the results window.

    Also, I don't typically run the entire script all at once when I'm doing this sort of thing.  I'll create the table, and then I'll just highlight the "Test #1" portion and execute it 5 times or so and take the average measurements.  Then I'll do the same for "Test #2" and so on.

    USE tempdb;
    SET NOCOUNT ON
    GO
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    
    
    --Test #1
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 100 * from test where c1 < 30000 order by c2 
    --Reads=7532, CPU=104, Duration=114
    go
    
    
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    
    
    --Test #2
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 100 * from test where c1 < 30000 order by c2 
    --Reads=7517, CPU=104, Duration=108
    go
    
    
    --Test #3
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 101 * from test where c1 < 30000 order by c2 
    --Reads=13199, CPU=542, Duration=10769
    go
    
    
    --Test #4
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    SET ROWCOUNT 101
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2
    --Reads=13176, CPU=370, Duration=1341
    go
    
    
    SET ROWCOUNT 0
    DROP TABLE test
    GO
    



    --Brad (My Blog)
    Monday, December 14, 2009 6:05 PM
  • Interesting thread.  My observation is (may be i am worng) it is to do more with ROW SIZE than any number (ie. 100 is the threashold or something like that).  If you make column C3 VARCHAR then even TOP 1000 and TOP 1001 is returning in same time.


    Madhu


    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/

    It's true that if you change the C3 column to a VARCHAR that you won't notice any difference (with your own eyes) in time between doing a TOP 100 or TOP 101.

    But there is something there... If you use SQL Profiler, you will see the following differences:

    TOP 100:  Reads=86, Duration=25ms   (Same measurements occur for TOP 99 and TOP 98 and so on downwards)
    TOP 101:  Reads=165, Duration=55ms  (Same measurements occur for TOP 102 and TOP 103 and so on upwards)

    This indicates to me that a different algorithm is being employed between <=100 and >=101.


    --Brad (My Blog)

    I agree that it seems like the TOP operator has a performance tipping point.  I look forward to seeing what the MS Dev team comes back with.
    http://jahaines.blogspot.com/
    Monday, December 14, 2009 6:09 PM
  • >
    >I agree that it seems like the TOP operator has a performance tipping point. 
    >I look forward to seeing what the MS Dev team comes back with.
    >

    Conor did tell me that he believed that 100 is a "magic number" and that they've talked about improving this but have just not done it yet.

    He wanted to confirm it, though, with the person who's on vacation.

    --Brad (My Blog)
    Monday, December 14, 2009 6:16 PM
  • Thanks Brad. Really exciting stuff. Here is my run on SQL Server 2008 with Profiler data.  When Duration is way out of proportion to Reads, it usually means blocking. 

    Query Reads CPU DurationMsec
    select top 100 * from test whe 10801 109 110
    select top 100 c1 from test wh 8268 31 20
    select top 100 * from test whe 8236 172 1684
    select top 100 c1 from test wh 8236 48 25
    select top 101 * from test whe 13893 375 3023
    select top 101 c1 from test wh 8236 15 26
    SELECT * FROM TEST WHERE C1 <  13890 391 2417
    SELECT c1 FROM TEST WHERE C1 < 8236 32 18


    -- TOP 100 / TOP 101 anomaly test
    USE tempdb;
    SET NOCOUNT ON
    GO
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 50000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    
    
    --Test #1
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 100 * from test where c1 < 30000 order by c2 
    go
    select top 100 c1 from test where c1 < 30000 order by c2 
    go
    
    
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    
    
    --Test #2
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 100 * from test where c1 < 30000 order by c2
    go
    select top 100 c1 from test where c1 < 30000 order by c2 
    go
    
    
    --Test #3
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 101 * from test where c1 < 30000 order by c2
    go
    select top 101 c1 from test where c1 < 30000 order by c2 
    go
    
    
    --Test #4
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    SET ROWCOUNT 101
    GO
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2
    go
    SELECT c1 FROM TEST WHERE C1 < 30000 ORDER BY C2
    go
    
    
    SET ROWCOUNT 0
    DROP TABLE test
    GO
    
    

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 7:19 PM
  • Little bit more excitement... 1 million rows instead of 50K, same script as in my previous post...the table turns....

    Query Reads CPU DurationMsec
    select top 100 * from test whe 16960 563 9002
    select top 100 c1 from test wh 8505 47 20
    select top 100 * from test whe 8235 62 2025
    select top 100 c1 from test wh 8235 32 17
    select top 101 * from test whe 13889 297 3644
    select top 101 c1 from test wh 8235 32 19
    SELECT * FROM TEST WHERE C1 <  13888 344 4905
    SELECT c1 FROM TEST WHERE C1 < 8235 31 20

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 7:41 PM
  • Little bit more excitement... 1 million rows instead of 50K, same script as in my previous post...the table turns....

    Query Reads CPU DurationMsec
    select top 100 * from test whe 16960 563 9002
    select top 100 c1 from test wh 8505 47 20
    select top 100 * from test whe 8235 62 2025
    select top 100 c1 from test wh 8235 32 17
    select top 101 * from test whe 13889 297 3644
    select top 101 c1 from test wh 8235 32 19
    SELECT * FROM TEST WHERE C1 <  13888 344 4905
    SELECT c1 FROM TEST WHERE C1 < 8235 31 20

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com


    Are those figures from pushing the execute button, walking away, and then coming back and looking at all the figures?  What if you change the order of your tests?  The first test that gets run could be more inflated.

    I did the million-row test also, and it doesn't come out much different from my previous figures on 50,000 rows (except more duration time to traverse the bigger clustered index to find the 30000 recs)... but again, I INDIVIDUALLY EXECUTE each test at least 5 times and take the average... I don't execute the entire thing all at once:

    Test #1 (TOP 100 before index rebuild):
    SELECT *: Reads=7533, CPU=83, Duration=2060
    SELECT c1: Reads=7533, CPU=68, Duration=1985

    Test #2 (TOP 100 after index rebuild):
    SELECT *: Reads=7517, CPU=99, Duration=2043
    SELECT c1: Reads=7517, CPU=63, Duration=1947

    Test #3 (TOP 101):
    SELECT *: Reads=14683, CPU=552, Duration=16750
    SELECT c1: Reads=7575, CPU=46, Duration=2088

    Test #4 (ROWCOUNT 101):
    SELECT *: Reads=13167, CPU=480, Duration=3967
    SELECT c1: Reads=7517, CPU=47, Duration=5704

    Also, your SELECT c1 tests were waaaaaay lower in terms of duration than your SELECT * tests because you didn't do the DBCC commands before them... you executed them immediately after the SELECT * tests and they participated in the same cacheing and buffering... thus you got duration times of 20ms or less.


    --Brad (My Blog)
    Monday, December 14, 2009 8:29 PM
  • Brad,

    Valid point about the missing DBCC DROPCLEANBUFFERS. I will do a new run.

    I am tracing event SQL: BatchCompleted into table TOP100 in Monitor DB.

    I use the following query to get the results:

    SELECT   QUERY = LEFT(convert(VARCHAR,TextData),50), 
             Reads, 
             CPU, 
             DurationMsec = Duration / 1000 
    FROM     TOP100 
    WHERE    LEFT(convert(VARCHAR,TextData),6) = 'select' 
    ORDER BY StartTime 
    In my experience I found "reads" better metrics for performance than "duration". In your case, I don't understand why duration is so much out of proportion for TOP 101. If one time, it can be blocking. If systematic, it can be tempdb issue (sorting the 2K deadweight column).


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 8:46 PM
  • New run for 1 million. DBCC DROPCLEANBUFFERS is used before each SELECT.

    Query Reads CPU DurationMsec
    select top 100 * from test where c1 < 30000 order  16960 156 14332
    select top 100 c1 from test where c1 < 30000 order 8505 62 1482
    select top 100 * from test where c1 < 30000 order  7561 78 2399
    select top 100 c1 from test where c1 < 30000 order 7525 32 2147
    select top 101 * from test where c1 < 30000 order  13893 328 3884
    select top 101 c1 from test where c1 < 30000 order 8277 63 2200
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2   12028 266 2978
    SELECT c1 FROM TEST WHERE C1 < 30000 ORDER BY C2   8241 46 1559

    -- TOP 100 / TOP 101 anomaly test
    USE tempdb;
    SET NOCOUNT ON
    GO
    create table test (c1 int primary key clustered, c2 int, c3 char(2000))
    declare @count int
    set @count = 1
    while @count <= 1000000
    begin
    insert into test values (@count, @count, 'a')
    set @count = @count + 1
    end
    GO
    
    
    --Test #1
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 100 * from test where c1 < 30000 order by c2 
    go
    DBCC DROPCLEANBUFFERS
    go
    select top 100 c1 from test where c1 < 30000 order by c2 
    go
    
    
    ALTER INDEX ALL ON [dbo].[test] REBUILD 
    
    
    --Test #2
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 100 * from test where c1 < 30000 order by c2
    go
    DBCC DROPCLEANBUFFERS
    go
    select top 100 c1 from test where c1 < 30000 order by c2 
    go
    
    
    --Test #3
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    select top 101 * from test where c1 < 30000 order by c2
    go
    DBCC DROPCLEANBUFFERS
    go
    select top 101 c1 from test where c1 < 30000 order by c2 
    go
    
    
    --Test #4
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go
    SET ROWCOUNT 101
    GO
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2
    go
    DBCC DROPCLEANBUFFERS
    go
    SELECT c1 FROM TEST WHERE C1 < 30000 ORDER BY C2
    go
    
    
    SET ROWCOUNT 0
    DROP TABLE test
    GO
    
    

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 8:51 PM
  • >
    > In your case, I don't understand why duration is so much out of proportion for TOP 101.
    >

    Apparently the original poster revised his message to include a link to a set of SQL Workshop Webcasts where this TOP 100 and TOP 101 phenomenon is illustrated... that's apparently where he got his script.

    If you look at the Webcast 3 of 3, you will see that SQL2005 is much more inefficient than SQL2008 in the difference between TOP 100 and TOP101. 

    I'm using 2005 and you're using 2008... and that's why my duration on the TOP 101 is so much higher.

    --Brad
    --Brad (My Blog)
    Monday, December 14, 2009 9:34 PM

  • If you look at the Webcast 3 of 3, you will see that SQL2005 is much more inefficient than SQL2008 in the difference between TOP 100 and TOP101. 

    I'm using 2005 and you're using 2008... and that's why my duration on the TOP 101 is so much higher.

    --Brad

    DBA-s / Developers / IT managers take notice! Another reason to upgrade to upgrade to SQL Server 2008.

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Monday, December 14, 2009 10:23 PM
  • >

    > In the Webcast 1 (www.sqlworkshops.com/webcasts) they actually show how to make TOP 101
    > faster in SQL Server 2008, but they hide their solution for SQL Server 2005. Can someone post a
    > way to make TOP 101 faster in SQL Server 2005, this will help me.
    >

    Okay, Bob, I found a way... this might be how he did it:

    Do the following:

    SELECT TOP 101 c1,c2,c3=(SELECT c3 FROM Test B WHERE c1=Test.c1)
    FROM Test
    WHERE c1<30000
    ORDER BY c2

    As R Meyyappan in the video suggested, I did not change the TOP 101 or the FROM or the WHERE or the ORDER BY.

    In fact, in executing the above, I got the query to be only 60ms in duration (whereas R Meyyappan in the video was excited to get it down to 244ms... and he had 8x the memory than I have on my machine).  That's even FASTER than the normal SELECT TOP 100.

    Hah!  Take that!

    The solution above is similar to what I mentioned last Friday as an alternative solution using JOINs.


    --Brad (My Blog)
    Monday, December 14, 2009 10:34 PM
  • Brad I think those results are highly dependent on the number of rows in the outer table.  As you may know the inner table has to be seeked/scanned for every row in the outer table, unless of course the optimizer decides to use a merge join.  I would imagine the above query will result in a nested loop join which means the out query is filtered for rows where the id is less than 30000 and then a nested loop join is used to filter on each individual id.  This means you will have more IO and the results should vary depending on the number of rows returned by the outer query which is a decent amount here, but may be a lot more in other cases.
    http://jahaines.blogspot.com/
    Monday, December 14, 2009 10:44 PM
  • What's interesting about the query plan for that query is that it does the SEEK into the clustered index (i.e. the WHERE c1<30000) and gets the C1 and C2 values.  Then it SORTs those 29999 rows by C2... BUT... it ALSO pulls out the TOP 101 at the same time (even though it's only a SORT operator and not a TOP N SORT operator)... That's the key to why it goes so fast.

    Then, for each of those 101 rows (Nested Loops), it does the SEEK to get the C3 column.

    Then, for whatever reason, it does the TOP 101 again via a TOP operator.  I have no idea why it feels this is necessary to do.




    --Brad (My Blog)
    Monday, December 14, 2009 10:53 PM
  • A completely tongue-in-cheek alternative, just to make it even faster is to create the following index and use the original query.

    CREATE NONCLUSTERED INDEX ncixc2 ON Test(c2, c1) INCLUDE (c3)


    However, it makes little sense to add 2 GB of index just to save 30ms :)

     

     

    Monday, December 14, 2009 10:58 PM
  • Brad did you get a TOP operator within the outer portion of the nested loop join?  I didnt get that when I just tried it.  I got an estimated number of rows that said said 101 were being moved but the actual number was 820.  That means the optimizer did 820 index seeks to satisfy the query.  Then once the results were joined together, the optimizer used a single top operator, on the 820 rows.
    http://jahaines.blogspot.com/
    Monday, December 14, 2009 11:07 PM
  • I uploaded the .SQLPLAN (actual query plan) and a JPG of the plan up to my SkyDrive:
    http://cid-7ba463f684be1dde.skydrive.live.com/browse.aspx/.Public?uc=2

    The SORT operator receives 29999 rows, but outputs only 101 (that's BOTH estimated and actual).

    Then that gets fed to Nested Loops, which does the lookup for the C3 values.

    And then those 101 rows get fed to a (seemingly redundant) TOP operator, which, of course, receives 101 rows and outputs 101 rows.


    Again, this was in SQL2005.  I know you work in SQL2008... and perhaps that's why you got a different result.


    --Brad (My Blog)
    Monday, December 14, 2009 11:17 PM
  • I tested with SQL Server 2008

    TOP 100 -> Cpu 93, Duration 133 and Reads 11018
    TOP 101 -> Cpu 234, Duration 2081 and Reads 16672
    TOP 101 (Brad's with correlated subquery) -> Cpu 47, Duration 185 and Reads 11672.

    So the correlated subquery need less cpu than seeking the table only once!

    It must be a way to force the optimizer to do a scan (cheaper for optimizer) and there is. See this rewrite

    select top 100 * from test where 0+c1 < 30000 order by c2
    select top 101 * from test where 0+c1 < 30000 order by c2

     

     

    Top 100 (Peso) -> Cpu 94, Duration 145 and Reads 18361.
    Top 101 (Peso) -> Cpu 140, Duration 170 and Reads 18361.

    Both are slightly faster than Brad's, but uses more CPU. The queries uses a clustered index scan.


    As you can see, Reads is NOT a good measure for performance. My rewrite uses 1/10th of the time of original 101 problem, and significantly more Reads.
    • Edited by SwePesoMVP Monday, December 14, 2009 11:45 PM
    Monday, December 14, 2009 11:31 PM
  • Really interesting, Peso...

    When I run your WHERE 0+C1<30000 on my SQL2005 box, I get the absolute worst performance of all.  It takes a full 20 seconds; whereas, the normal vanilla TOP 101 query (i.e. just WHERE C1<30000) took 15-16 seconds.

    Again, the difference between SQL2005 and 2008, I guess.  But what a stark difference!


    --Brad (My Blog)
    Monday, December 14, 2009 11:43 PM
  • I am testing this on 2008 R2 Nov CTP 10.50.1352.12. When you ran the query, did you also get a index scan on 2005, or a table scan?

    I posted the results as an input to your forthcoming blog post. The "+0" is only there to get rid of any index use.
    Monday, December 14, 2009 11:47 PM
  • What is the query plan for your clustered index scan method?

    It doesn't seem to make sense to me...  You reported this, right?:

    SELECT TOP 101 * FROM Test WHERE C1<30000 ORDER BY C2
    --CPU=234, Duration=2081

    SELECT TOP 101 * FROM Test WHERE 0+C1<30000 ORDER BY C2
    --CPU=140, Duration=170

    In SQL2005, the above create the same query plan, except the first one does a Clustered Index SEEK, and the second one does a Clustered Indes SCAN.  Both of them feed the rows to the Top N Sort operator, which sorts and pulls out the TOP 101 rows.

    In other words, they both feed 29999 rows to the Top N Sort operator, and that's supposedly the bottleneck.

    How come your duration is so incredibly small in the SCAN?  Something ridiculously much more efficient in 2008?

    --Brad (My Blog)
    Monday, December 14, 2009 11:55 PM
  • >
    >When you ran the query, did you also get a index scan on 2005, or a table scan?
    >

    It's the same thing, isn't it?  Because the table has a PRIMARY KEY CLUSTERED INDEX on C1... so the index IS the table.



    --Brad (My Blog)
    Tuesday, December 15, 2009 12:09 AM
  • Brad,

    Are you supplying the maxdop 1 hint?  I get the query plan with the top filter in the sort if and only if i use the maxdop hint.  If I do not use the maxdop hint, the number of rows is greater than the specified top amount.
    http://jahaines.blogspot.com/
    Tuesday, December 15, 2009 12:21 AM
  • Brad,

    Are you supplying the maxdop 1 hint?  I get the query plan with the top filter in the sort if and only if i use the maxdop hint.  If I do not use the maxdop hint, the number of rows is greater than the specified top amount.
    http://jahaines.blogspot.com/

    Interesting question.

    I am NOT using the MAXDOP hint; however, I did have my Server Properties set to Max Degree of Parallelism = 1. 

    When I changed it to 0 (and I have 2 processors), the query plan institutes parallelism and the SORT operator now receives 29999 rows, estimates 101, but ACTUALly produced 186.  Not as high as 860 like you got, but perhaps you have more processors... or maybe faster ones?

    My Nested Loops operator feeds those 186 rows to the Parallelism/GatherStreams operator, and that pares it down to 101 rows, which then gets fed to the (seemingly redundant) TOP operator.


    --Brad (My Blog)
    Tuesday, December 15, 2009 12:35 AM

  • As you can see, Reads is NOT a good measure for performance. My rewrite uses 1/10th of the time of original 101 problem, and significantly more Reads.

    Hah.... script pls...hope you are not forgetting DBCC DROPCLEANBUFFERS....

    SQL Server 2008 run 1 million rows. As we can see reads is good metrics.

    Query Reads CPU DurationMsec
    select top 100 * from test where c1 < 30000 order by c2    16960 374 19369
    select top 100 c1 from test where c1 < 30000 order by c2    8505 0 1432
    select top 100 * from test where c1 < 30000 order by c2   8235 62 1744
    select top 100 c1 from test where c1 < 30000 order by c2    8235 15 1684
    select top 100 * from test where 0+c1 < 30000 order by c2   250792 3281 54936
    select top 101 * from test where c1 < 30000 order by c2   13887 328 3683
    select top 101 * from test where 0+c1 < 30000 order by c2   250764 1375 47836
    select top 101 c1 from test where c1 < 30000 order by c2    8235 47 1475
    SELECT * FROM TEST WHERE C1 < 30000 ORDER BY C2   11482 156 2617
    SELECT c1 FROM TEST WHERE C1 < 30000 ORDER BY C2   8235 16 1689
    SELECT * FROM TEST WHERE 0+C1 < 30000 ORDER BY C2   250780 1437 47891




    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, December 15, 2009 1:37 AM
  • Why should I do DBCC DROPCLEANBUFFERS on a production machine?
    That would be like letting the engine cool off everytime I start it again.
    Tuesday, December 15, 2009 5:06 AM
  • I just tested on SQL Server Express 2008, and both queries are taking almost the same time( < 1 sec). A clustered index seek is being  done and a Sort operation.

    select top 100 * from test where 0+c1 < 30000 order by c2
    select top 101 * from test where 0+c1 < 30000 order by c2

    I also ran them after
    DBCC DROPCLEANBUFFERS
    dbcc freeproccache
    go

    And TOP 100 took 4 seconds, TOP 101 took around 5 seconds. I couldn't get the CPU and the READS since I don't have Profiler.

    The following is doing Clustered Index Scan, with almost the same time (again, couldn't get the actual duration,read, and CPU)

    select top 100 * from test where 0+c1 < 30000 order by c2
    select top 101 * from test where 0+c1 < 30000 order by c2




    Abdallah El-Chal, PMP, ITIL, MCTS
    Tuesday, December 15, 2009 5:38 AM
  • I just tested on SQL Server Express 2008, and both queries are taking almost the same time( < 1 sec). A clustered index seek is being  done and a Sort operation.


    Abdallah El-Chal, PMP, ITIL, MCTS

    Precisely. I don't think there is any issue, interesting thread though. Nobody ever promised that SQL Server will behave exactly linearly as far as performance concerned. Fragmented index, execution plan variation and sorting operation can cause non-linear behavior.

    Development Performance Measurent 101:

    1. Use non-production machine
    2. REBUILD index(s) prior to measurement
    3. Use DBCC DROPCLEANBUFFERS before queries
    4. Use dbcc freeproccache if compile time is significant before queries
    5. SET STATISTICS IO ON/OFF and SET STATISTICS TIME ON / OFF will yield basic performance data
    6. Profiler SQL:BatchCompleted event saved to  a table will yield reads, writes, CPU and duration data


    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, December 15, 2009 7:34 AM
  • DBCC DROPCLEANBUFFERS will tell you more about how fast your disk system is, than the performance of the code.
    Disks are much slower than physical RAM. I verified this by adding a ram-drive and moving the test database there.
    Now I get same fast speed (with DBCC DROPCLEANBUFFERS on ram disk) as the code before (without DBCC DROPCLEANBUFFERS and on conventional disks).

    Does it mean the code is fast? No, it only means the disk system is really fast (it should be because it is a ram drive).

    Also statistics io has issues (bugs) with certain join operations/conditions and will not report accurate (if at all) timings. You can see this quite often when worktables are involved. How often do you seen 0 as metric for the statistics io for worktables?

    Tuesday, December 15, 2009 7:52 AM
  • What a coinccident! I am on the same issue just at the time.
    I was considering implementing an algorithm like this:
    First populate the N rows to a table variable (with index on the sort column), then iterate through all left rows, adding one row to the table variable if bigger than min of the table, else discard it.  This could be either done in sql or clr aggregate function.
    Then I thought maybe MS had already done it in the Top N stuff, so started to run a test against it.

    CREATE TABLE [dbo].[NUM]
    ([n] int NOT NULL, s varchar(128) NULL, PRIMARY KEY CLUSTERED([n] ASC)) 
    go
    -- populate data
    set nocount on 
    declare @n int, @i int 
    set @n=1000000 
    set @i = 0 
    while @n>0 begin 
     if @i = 0 begin tran 
     insert into dbo.NUM 
     select @n, convert(varchar,@n + @i * 2) 
     set @n=@n-1 
     set @i = (@i + 1) % 1000 
     if @i = 0 commit 
    end 
    GO 
    -- test 1
    select  top ( XX ) cast(s as int), n from dbo.num
    order by cast(s as int) desc
    go
    -- test 2
    set rowcount XX
    select cast(s as int), n from dbo.num
    order by cast(s as int) desc
    go


    for test 1, duration < 1s, for any XX <= 100, and the duration is about 12s for any XX >100

    for test 2, the duration is fixed at 4s for XX: 10  - 100,000.

    The show-plan shows test 1 uses Top N sort op, while the test 2 uses Sort  op.
    Ok I dont care about the sort op. The only thing I care is if MS has correctly implemented the Ton N Sort.
    MSDN stated about "Top N sort": 
    "Top N Sort is similar to the Sort iterator, except that only the first N rows are needed, and not the entire result set. For small values of N, the SQL Server query execution engine attempts to perform the entire sort operation in memory. For large values of N, the query execution engine resorts to the more generic method of sorting to which N is not a parameter."

    As you can see, this statement sound like the algorithm I was intending to write myself. But the later part mentioned a "more generic method of sorting to which N is not a parameter", that exlains why no matter how XX changes for test1 after going beyong 100, the duration is always the same.  Test 2 is also insensitive to N. 
    So MS seems used 3 algorithm, in which two of them are used for "top N", one is for "set rowcount".

    I do not think whether to perform it in memory or not will cause such a big difference. It's mainly due to that only one (the fastest one) uses the algorithm of just keeping the top N rows and then evict low ranking items when they fall below the N window.

    Tuesday, December 15, 2009 11:08 PM
  • Johnx,

    I added index REBUILD and DBCC DROPCLEANBUFFERS to your script and the results are strikingly similar to previous results without any drama or excitement.

    Index REBUILD ensures that you are not hitting fragmented segments of the B-Tree.

    DBCC DROPCLEANBUFFERS ensures that you are not comparing disk reads (slow) with buffer reads (fast).

    TOP (100)
       CPU time = 641 ms,  elapsed time = 896 ms.
    TOP (101)
       CPU time = 828 ms,  elapsed time = 1322 ms.
    ROWCOUNT 101
       CPU time = 734 ms,  elapsed time = 1208 ms.
    


    Script executed on SQL Server 2008:
    USE tempdb;
    GO
    CREATE TABLE [dbo].[NUM]
    ([n] int NOT NULL, s varchar(128) NULL, PRIMARY KEY CLUSTERED([n] ASC)) 
    go
    -- populate data
    set nocount on 
    declare @n int, @i int 
    set @n=1000000 
    set @i = 0 
    while @n>0 begin 
     if @i = 0 begin tran 
     insert into dbo.NUM 
     select @n, convert(varchar,@n + @i * 2) 
     set @n=@n-1 
     set @i = (@i + 1) % 1000 
     if @i = 0 commit 
    end 
    GO 
    ALTER INDEX ALL ON Num REBUILD
    GO
    PRINT 'TOP (100)'
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select  top ( 100 ) cast(s as int), n from dbo.num
    order by cast(s as int) desc
    SET STATISTICS TIME OFF
    go
    PRINT 'TOP (101)'
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select  top ( 101 ) cast(s as int), n from dbo.num
    order by cast(s as int) desc
    SET STATISTICS TIME OFF
    go
    PRINT 'ROWCOUNT 101'
    set rowcount 101
    DBCC DROPCLEANBUFFERS
    SET STATISTICS TIME ON
    select cast(s as int), n from dbo.num
    order by cast(s as int) desc
    SET STATISTICS TIME OFF
    go
    DROP TABLE [dbo].[NUM]
    
    

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Tuesday, December 15, 2009 11:49 PM
  • I am using a sql 2005.

    I also tested the "select top (@n)" variation. The result shows that "select top (@n)" is similar to "set rowcount...".

    The reason I tested the "select top (@n)" variation is that I was wondering if We could use plan-force to force it use the faster "Top N Sort". However it seems that "select top (@n)" is quite different from "select top (xx)" where xx is a constant,  but similar to  "set rowcount; ...". Guess it will not work,  so I will not try to test if plan-force can do the job.

    Just curious why MS choose not to use the "Top N Sort" algorithm always, instead to choose this so complex arrangement (i.e. some with "Top N Sort", some with the "Sort then Top").   I think, "Top N Sort" should always be used, can someone point out to me  a case when "Sort then Top" will perform better than "Top N Sort"?
    Wednesday, December 16, 2009 1:58 AM
  • SET ROWCOUNT x

    is a deprecated feature and is very well documented in Books Online that it will be removed in a future version of SQL Server.
    Don't build new business logic around this feature.
    Wednesday, December 16, 2009 7:45 AM
  • Hi felows,

    I don't know if it helps, but I wrote about that some time ago...

    http://translate.google.com.br/translate?u=http%3A%2F%2Ffabianosqlserver.spaces.live.com%2FBlog%2Fcns%2152EFF7477E74CAA6%211756.entry&sl=pt&tl=en&hl=&ie=UTF-8

    Cheers

    http://www.simple-talk.com/author/fabiano-amorim/


    Fabiano Neves Amorim - MCTS / MCP - SQLServer - http://fabianosqlserver.spaces.live.com/
    Friday, May 14, 2010 7:54 PM
  • " Quando uso TOP 100 o SQL utiliza 550mb de memória, 9875 de CPU e retorna a consulta em 40 segundos, quando uso TOP 101 o SQL utiliza 798mb de memória, 42745 de CPU e retorna a consulta em 1 minuto e 20 segundos. When I use the SQL TOP 100 uses 550MB of memory, CPU 9875 and the query returns in 40 seconds, when I use the SQL TOP 101 uses 798mb of memory, 42 745 CPU and returns the query in 1 minute and 20 seconds. "

    SQL Server uses the available resources according to its (secret) internal algorithm. It can be imagined that the second operation may involve tempdb as opposed to memory.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, May 14, 2010 8:27 PM
  • Hi felows,

    I don't know if it helps, but I wrote about that some time ago...

    http://translate.google.com.br/translate?u=http%3A%2F%2Ffabianosqlserver.spaces.live.com%2FBlog%2Fcns%2152EFF7477E74CAA6%211756.entry&sl=pt&tl=en&hl=&ie=UTF-8

    Cheers

    http://www.simple-talk.com/author/fabiano-amorim/


    Fabiano Neves Amorim - MCTS / MCP - SQLServer - http://fabianosqlserver.spaces.live.com/


    Hi Fabiano...

    I like the blog entry.  Did you see Adam's blog on the subject as well?:  http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will_17.html

    How did you hear about the TOP 101 phenomenon that drove you to write about it in February?  Did you stumble upon it yourself?  Or did you see Ramesh Meyappan's webcasts?

    (By the way, I'm enjoying your blog posts at simple-talk.com about the Showplan Operators of the Week).

     

     


    --Brad (My Blog)
    Friday, May 14, 2010 8:28 PM
  • Fabiano,

    Can you post a query which we all can test? Thanks.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, May 14, 2010 8:54 PM
  • Brad, I haven't see the Adam's blog on the subject... I take a look now and seems to be an very interesting point of view… I wrote about it because I found this problem in one of our customers then I start to search about it.

     

    By the way, Adam’s blog is now on my RSS feeds…

     

    I hate when we need do this kind of stuff to “fix” some SQL behaviors… well, especially because we only do this when some person find this problems…

     

    What a great discovery about the left order join… well… that is a very very great argue with Conor. Congrats for that. I was one of the guys who used to say, “The order doesn’t matter”, now the things has changed a little bit…

     

    I’m glad you are enjoying the “Operator of the Weeks”, much more very interesting things will come J

     

    Tks.

     


    Fabiano Neves Amorim - MCTS / MCP - SQLServer - http://fabianosqlserver.spaces.live.com/
    Friday, May 14, 2010 9:07 PM
  • Sorry SQLUSA I'm afraid I cannot post the original query.

     

    But I think, the Adam’s blog could be useful to do the tests, don’t you think?

    Thanks


    Fabiano Neves Amorim - MCTS / MCP - SQLServer - http://fabianosqlserver.spaces.live.com/
    Friday, May 14, 2010 9:18 PM
  • Thanks Fabiano. I did exactly that, took Adam's code and added a few tricks from my own optimization wizard hat.  Results follows. Not exactly linear, but whoever promised B-tree indexing to be linear? Whoever said that queries on fragmented indexes will yield linear results? The execution plans are identical for 100 & 101. If you do SELECT TOP (@top) ... results about the same.

    BTW - I did Sybase performance tuning for a week or so in Sao Paolo - huge city! Loved Rio!

    TEST 17 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 156 ms, elapsed time = 1679 ms.
    
    TEST 100 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 203 ms, elapsed time = 1736 ms.
    
    TEST 101 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 281 ms, elapsed time = 2395 ms.
    
    TEST 100 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 124 ms, elapsed time = 1564 ms.
    
    TEST 99 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 125 ms, elapsed time = 1543 ms.
    
    TEST 150 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 203 ms, elapsed time = 2790 ms.
    
    TEST 7 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 125 ms, elapsed time = 1582 ms.
    
    TEST 102 *******************************************************
    
     SQL Server Execution Times:
     CPU time = 312 ms, elapsed time = 2543 ms.
    
    /******************************************************
    ORIGINAL CODE by Adam Haines 
    FROM: http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will_17.html
    
    ALTER INDEX added by Kalman Toth
    DBCC added by Kalman Toth
    Different TOP values added by Kalman Toth
    ********************************************************/
    USE [tempdb]
    GO
    
    SET NOCOUNT ON;
    GO
    
    IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
    BEGIN
     DROP TABLE dbo.[TestData];
    END
    GO
    
    CREATE TABLE dbo.TestData(
    RowNum INT PRIMARY KEY,
    SomeId INT,
    SomeCode CHAR(2000)
    );
    GO
    
    ;WITH 
     L0 AS (SELECT 1 AS C UNION ALL SELECT 1) --2 rows
     ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B) --4 rows (2x2)
     ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B) --16 rows (4x4)
     ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B) --256 rows (16x16)
     ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B) --65536 rows (256x256)
     ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B) --4,294,967,296 rows (65536x65536)
     ,Number AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5) 
    INSERT INTO dbo.TestData
    SELECT
     N AS RowNumber,
     ABS(CHECKSUM(NEWID()))%1000000+1 AS SomeId ,
     REPLICATE('a',2000) AS SomeCode
    FROM Number
    WHERE [N] <= 50000
    GO
    ALTER INDEX ALL ON dbo.[TestData] REBUILD 
    GO
    UPDATE STATISTICS dbo.[TestData] WITH FULLSCAN;
    GO
    
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 17 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 17 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION (MAXDOP 1)
    SET STATISTICS TIME OFF
    
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 100 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION (MAXDOP 1)
    SET STATISTICS TIME OFF
    
    
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 101 *******************************************************'
    --Slow
    SET STATISTICS TIME ON
    SELECT TOP 101 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION(MAXDOP 1)
    SET STATISTICS TIME OFF
    
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 100 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION (MAXDOP 1)
    SET STATISTICS TIME OFF
    
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 99 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 99 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION (MAXDOP 1)
    SET STATISTICS TIME OFF
    
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 150 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 150 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION(MAXDOP 1)
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+ 'TEST 7 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 7 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION (MAXDOP 1)
    SET STATISTICS TIME OFF
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS 
    DBCC FREEPROCCACHE WITH NO_INFOMSGS
    PRINT CHAR(10)+'TEST 102 *******************************************************'
    
    SET STATISTICS TIME ON
    SELECT TOP 102 [RowNum],[SomeId],[SomeCode]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
    OPTION (MAXDOP 1)
    SET STATISTICS TIME OFF
    
    GO
    DROP TABLE dbo.TestData
    

    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, May 14, 2010 11:09 PM