Asked by:
Lightning Fast Hybrid RUNNING TOTAL - Can you slow it down?

Question
-
You read it right! Usually we speed queries up, this time you are asked to slow it down!
For years the T-SQL forums/blogs were abuzz with the fast multiple assignment operator single-update RUNNING TOTAL. While it always calculates correctly, there is no guarantee that it follows the desired sort order since the database engine works on unordered sets. What is amazing about it, very hard to make it fail. Partitioned tables & parallel processing exluded.
The following hybrid RUNNING TOTAL does:
1. Executes the fast single-update RUNNING TOTAL (0.7 sec execution time on average 500K rows)
2. It tests the results ordering
3. If not desired order, the control flow executes the Paul Nielsen (www.sqlserverbible.com) cursor WHILE method (21 sec 500K records) which guaranteed to work.
I put the single-update method through thousands of tests, it never flipped over to the Paul Nielsen method.
It would be nice to know when it fails, even if it is a statistical probability. Can you make it flip over to the Paul Nielsen method (fail the ordering test)?
Can you help?
Thanks.
-- T-SQL HYBRID RUNNING TOTAL calculation -- Try Fast Method; if does not work, do slow method -- SQL Server 2008 T-SQL code USE tempdb; SELECT SalesOrderID = ROW_NUMBER() OVER ( ORDER BY (SELECT 1)), OrderDate=CONVERT(date, OrderDate), Sales=TotalDue, RunningTotal=CONVERT(money, 0) INTO Sales FROM AdventureWorks2008.Sales.SalesOrderHeader CROSS JOIN AdventureWorks2008.HumanResources.Department ORDER BY SalesOrderID DESC -- (503,440 row(s) affected) -- Build clustered index to help the database engine -- DESC sort is used for demo purposes only CREATE CLUSTERED INDEX idxSOID ON Sales (SalesOrderID DESC) DECLARE @RunningTotal money = 0.0 BEGIN -- SINGLE-UPDATE FAST RUNNING-TOTAL DBCC DROPCLEANBUFFERS SET @RunningTotal = 0.0 UPDATE Sales SET @RunningTotal = RunningTotal = @RunningTotal + Sales OPTION (MAXDOP 1) -- 1 sec END -- CHECK if RunningTotal is SalesOrderID order -- IF not in desired order, perform the Paul Nielsen cursor WHILE method IF 0 != (SELECT NoMatchInOrderCount=count(*) FROM (SELECT ID1 = ROW_NUMBER() OVER ( ORDER BY SalesOrderID DESC), ID2 = ROW_NUMBER() OVER ( ORDER BY RunningTotal ASC) FROM Sales) x WHERE ID1 != ID2) BEGIN -- PAUL NIELSEN method http://www.sqlusa.com/bestpractices2005/runningtotalusingcursor/ DBCC DROPCLEANBUFFERS SET NOCOUNT ON DECLARE @SalesOrderID INT, @Sales MONEY SET @RunningTotal = 0 DECLARE cRun CURSOR FAST_FORWARD FOR SELECT SalesOrderID, Sales FROM Sales ORDER BY SalesOrderID DESC OPEN cRun FETCH cRun INTO @SalesOrderID, @Sales -- prime the cursor WHILE @@FETCH_STATUS = 0 BEGIN SET @RunningTotal = @RunningTotal + @Sales UPDATE Sales SET RunningTotal = @RunningTotal WHERE SalesOrderID = @SalesOrderID FETCH cRun INTO @SalesOrderID, @Sales -- fetch next END CLOSE cRun DEALLOCATE cRun -- 22 seconds END GO SELECT TOP (10) * FROM Sales ORDER BY RunningTotal ASC GO /* SalesOrderID OrderDate Sales RunningTotal 503440 2004-07-31 209.9169 209.9169 503439 2004-07-31 34.2219 244.1388 503438 2004-07-31 82.8529 326.9917 503437 2004-07-31 93.8808 420.8725 503436 2004-07-31 46.7194 467.5919 503435 2004-07-31 149.4292 617.0211 503434 2004-07-31 32.5754 649.5965 503433 2004-07-31 5.514 655.1105 503432 2004-07-31 88.9194 744.0299 503431 2004-07-31 23.7465 767.7764 */ DROP TABLE tempdb.dbo.Sales
Kalman Toth
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016
- Edited by Kalman Toth Thursday, November 9, 2017 2:49 PM
Tuesday, December 1, 2009 11:40 PM
All replies
-
There is a note in one of Itzik's books that says (I omitted one sentence about SELECT * behavior)"Note Although SQL doesn't assume any given order to a table's rows, it does maintain ordinal positions for columns based on creation order. Specifying SELECT * guarantees the columns would be returned in creation order. In this respect SQL deviates from the relational model"
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 1:03 AM -
. Specifying SELECT * guarantees the columns would be returned in creation order. In this respect SQL deviates from the relational model"
Abdallah, PMP, ITIL, MCTS
Thanks Abdallah. You think that is the connection to the single-update RUNNING TOTAL?
Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 1:56 AM -
I didn't do thousands, but did a few, and same results. It's not getting to the CURSOR.
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 2:19 AM -
How about timing? I got 0.7 sec for 500K rows (UPDATE statement only, excluding prep & ordering check). Simply amazing speed!
Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.comWednesday, December 2, 2009 2:26 AM -
Well, it's taking 3 seconds on my computer, but I have SQL Express at the momenet running on Windows XP.It is the Clustered Index that is causing this fast update.
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 2:41 AM -
Well, it's taking 3 seconds on my computer, but I have SQL Express at the momenet running on Windows XP.
It is the Clustered Index that is causing this fast update.
Abdallah, PMP, ITIL, MCTS
For a free product that is not bad. Can you test the cursor WHILE for execution time also? My ratio is 1:30.
Thanks.
Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 3:17 AM -
I excluded DBCC DROPCLEANBUFFERS in my previous test. I included it now and it ran in 1:24 secs
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 3:20 AM -
No luck. I removed the ORDER BY and used a derived table hoping to get random order somehow but no luck
SELECT * INTO Sales FROM ( SELECT SalesOrderID = ROW_NUMBER() OVER (ORDER BY (SELECT 1)), OrderDate=CONVERT(date,OrderDate), Sales=TotalDue, RunningTotal=CONVERT(money,0) FROM AdventureWorks.Sales.SalesOrderHeader CROSS JOIN AdventureWorks.HumanResources.Department --ORDER BY SalesOrderID DESC ) AS X
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 3:59 AM -
There is a note in one of Itzik's books that says (I omitted one sentence about SELECT * behavior)
"Note Although SQL doesn't assume any given order to a table's rows, it does maintain ordinal positions for columns based on creation order. Specifying SELECT * guarantees the columns would be returned in creation order. In this respect SQL deviates from the relational model"
Abdallah, PMP, ITIL, MCTS
Isn't this talking about the columns always coming out in the order col1, col2, col3, etc., and doesn't this quote have nothing to do with the order of the rows?Wednesday, December 2, 2009 12:24 PM -
There is a note in one of Itzik's books that says (I omitted one sentence about SELECT * behavior)
"Note Although SQL doesn't assume any given order to a table's rows, it does maintain ordinal positions for columns based on creation order. Specifying SELECT * guarantees the columns would be returned in creation order. In this respect SQL deviates from the relational model"
Abdallah, PMP, ITIL, MCTS
Isn't this talking about the columns always coming out in the order col1, col2, col3, etc., and doesn't this quote have nothing to do with the order of the rows?
Abdallah, PMP, ITIL, MCTSWednesday, December 2, 2009 12:54 PM -
Kent,
What is your take on it? Could it be forced fail? Maybe if executed million times? Or different column structure?
Hunchbank thinks that the clustered index is essential for the single-update RUNNING TOTAL to work as expected.
To All: The multiple assignment operator single-update RUNNING TOTAL always calculates correctly. The issue is that UPDATE does not take ORDER BY, so there is no guarantee that the order of totaling is correct for example OrderDate ASC.
Kalman Toth
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016
- Edited by Kalman Toth Thursday, November 9, 2017 2:50 PM
Wednesday, December 2, 2009 12:54 PM -
Here are a couple of posts:
http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/522af80f-4e77-4160-9835-ca317d69aaf6/
http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/31afc25a-b5e4-4f92-b574-ebb524c60150/
Umachandar
Look in particular at the second one in which both Uma and Jens talk about making the update serializable. I have never witnessed a failure of this kind of code. Also, if you will look around at some of my older posts you will see that I have used the update extensions as a method of doing this kind of update; however, in those posts I didn't make the posts serializable so I am not sure of the real impact. What I can tell you is that this is not a supported feature and is subject to change without notice -- use at your own risk. Presently I am not using this in any production code. I would certainly be willing to do this for 1-offs if the dominating requirement were performance, but I can also say that I like the idea of the hybrid code that guarantees correctness; I will have to give this a try. Also, I have taken advantage of this kind of extensions since my Sybase days -- probably going all the way back to 1991. Therefore, I would say that although this feature is undocumented and supported it has one other adjective that I would choose for an attribute: venerable.
Also, if you look around you will find at least one MVP that is adamant about the positives of this method because the improvement is night and day.Also, look at this thread from 2006 by William Lowers:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/dd4252dc-8281-4594-aa32-d233619ef7b2Give a look at the heart of the method presented by Jared Ko
Also, note that the execution time went from about 40 hours to about 3-5 minutes.
Also look at Jared's comments: "... AFAIK, it's undocumented but it works in SQL 2005 as well so you're safe for a little while. ..."
Well, that was three years ago and we are now beyond SQL Server 2005.
What I worry about is whether this venerable, undocumented feature has much remaining as far as shelf life is concerned. For the most part I have "retired" the use of this feature. I worry that if you use this feature that your code will be forced into "early retirement" -- and not on your terms but under unfavorable terms.
BUT! If you use this with the watchdog you are describing there may be more life in this old code.
I cannot check this out today, but I will give it a lookWednesday, December 2, 2009 1:48 PM -
Check "The Rules" section directly in this article http://www.sqlservercentral.com/articles/T-SQL/68467/Wednesday, December 2, 2009 2:25 PM
-
Check "The Rules" section directly in this article http://www.sqlservercentral.com/articles/T-SQL/68467/
Jeff has beat this issue to death and has a very thorough understanding of the issues and everything in the rules is on target. I would suggest that if you follow Jeff's rules you should be in good shape.Wednesday, December 2, 2009 2:43 PM -
Thanks Kent. I reinforced it one of Jeff Moden's suggestion - avoid parallelism (edited the original post).
Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.comSunday, December 6, 2009 7:12 PM -
I broke it. I tested on a partitioned table...
Thanks for trying.
If you look at the last link Kent provided, Jeff Moden's 3. rule: no partitioned table.
BTW: I am getting more and more impressed with the single-update RUNNING TOTAL...it takes parallel processing or partitioned table to break it...must be real solid for sequential processing with regular table....
Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.comMonday, December 7, 2009 1:26 AM -
Thanks, using this technique I solved a similar problem.
DECLARE @T TABLE (Id INT, VALUE BIGINT)INSERT @TSELECT 1, 5UNION ALL SELECT 1, 7UNION ALL SELECT 2, 15UNION ALL SELECT 3, 9UNION ALL SELECT 3, 154UNION ALL SELECT 3, 1UNION ALL SELECT 3, 14UNION ALL SELECT 4, 195UNION ALL SELECT 4, 12
We'd want to see this result:
Id Product
1 35
2 15
3 19404
4 2340
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blogSunday, February 14, 2010 5:41 AM -
You can also solve that problem this way:
select Id,Product=exp(sum(log(Value))) from @T group by Id
--Brad (My Blog)Sunday, February 14, 2010 7:16 AM -
Check "The Rules" section directly in this article http://www.sqlservercentral.com/articles/T-SQL/68467/
Jeff has beat this issue to death and has a very thorough understanding of the issues and everything in the rules is on target. I would suggest that if you follow Jeff's rules you should be in good shape.
Heh... and they tried to beat Jeff to death in the process. ;-) Thanks for the leg up, Kent.
--Jeff ModenSunday, April 25, 2010 1:06 AM -
Thanks Kent. I reinforced it one of Jeff Moden's suggestion - avoid parallelism (edited the original post).
Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
Kalman, thanks for starting such a friendly post on this method. It's a real pleasure to see other folks testing it for all it's worth. I also think that it's very cool that you built in validation of the results with a backup method in case of failure.As a side bar, you might want to add a few of the other rules I presented in my article to your good code just to further reduce the chances of anything going wrong in the future. Heh... of course, it HAS survived every hot fix, CU, SP, and major release since the original release of SQL Server and all the changes to Sybase before that but it's good to have a backup plan and an edge.
--Jeff ModenSunday, April 25, 2010 1:16 AM -
At least for now. It is superfast and works almost all the times. As of now.
Who knows when the algorithm will change in next service pack? Or next version?
But until then I use it too, when no other method can compete. But I document it very clearly where I use this technique, if it fails in the future.
Hi Peso...Man, you're everywhere. Hope you have the time to drink a good beer now and then. ;-)
Except for when the rules aren't followed (like allowing parallelism or running on a partition, etc), it's worked 100% for all the implementations of this type of code I've ever written never mind "almost all the times". I agree... since it's mostly undocumented, it could break in the future... just like a lot of documented stuff has. ;-) That's why I wrote tests into the original article on the subject and the rewrite.
--Jeff ModenSunday, April 25, 2010 1:21 AM -
Hi Jeff!
You know I am for methods and algorithms which are undocumented if they serve a purpose and are faster than any traditional method.
You just have to document them if they break sometime in the future, so you can replace them.
See this in-depth article of MVP Kathi Kallenberger where she dissects another fast method of doing running total
http://www.simple-talk.com/sql/performance/writing-efficient-sql-set-based-speed-phreakery/
I wrote a blog post about the very matter (about using unsupported/undocumented features) here
http://weblogs.sqlteam.com/peterl/archive/2010/02/10/Unsupported-andor-undocumented-features.aspx
And yes, we had a lot of beer and fun at european PASS in Neuss last week!
//Peter
Oh yes... I know that, Peter. And you're one of the good guys, as well. I went off on a bit of a tangent because I didn't want people to think they shouldn't use the "Quirky" update just because people speculate that it might-maybe-could-whatif break someday in the future. I also didn't want people to think that it worked only "almost all the times". Once properly setup, it works 100% of the time.Heh... I've see Kathi's fine article... Tony Davis asked me several questions about it in the publishing process. He wanted to use my name in some post article review comments and I respectfully said "no" because it's her article and your code. I will have to read your blog on the use of U2 features. Knowing you, I'm sure it'll be great.
Thanks for the feedback, Peter.
--Jeff ModenSunday, April 25, 2010 4:59 PM -
IF 0 != (SELECT NoMatchInOrderCount=count(*)
FROM (SELECT ID1 = ROW_NUMBER() OVER (ORDER BY SalesOrderID DESC),
ID2 = ROW_NUMBER() OVER (ORDER BY RunningTotal ASC)
FROM Sales ) x
WHERE ID1 != ID2)
This is very easy to break. The logic above above assumes every runningtotal value is at least positive to get an increasing [unique] running total value.
It just takes ONE negative sales value in the running total to break it. Or at least force it into CURSOR mode.
Or have two or more orders with $0 sales value. Then two or more records will "halt" on same running total value and these two or more records may not fit the same order every execution. The two or more records with same running total value will be randomly selected in the ID2 evaluation.
//Peter
--Jeff ModenSunday, April 25, 2010 5:03 PM -
Here are a couple of posts:
http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/522af80f-4e77-4160-9835-ca317d69aaf6/
http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/31afc25a-b5e4-4f92-b574-ebb524c60150/
Umachandar
Look in particular at the second one in which both Uma and Jens talk about making the update serializable.
Wow... I took a look at the second post. I've seen a lot of such posts and they always leave me wondering "why"? Why do they NOT plan on multiple rows using an increment and why do they always return just one value instead of a set of values (ID's)?
As a side bar... the 3 part update IS documented in Books Online and IS supported. It's the updating in clustered index order that's not documented. Lookup the syntax for UPDATE.
--Jeff ModenMonday, April 26, 2010 5:38 AM -
it HAS survived every hot fix, CU, SP, and major release since the original release of SQL Server and all the changes to Sybase before that but it's good to have a backup plan and an edge.
--Jeff Moden
Jeff,Sybase? Must have been pretty good internal algorithm not requiring change in two decades!
I never used the multiple-assignment operator anywhere except here. Does it have other uses?
Kalman Toth, SQL Server & Business Intelligence Training; SQLUSA.comMonday, April 26, 2010 12:02 PM -
Yep... SyBase. The 3 part update in the Rushmore engine has been around as long as UPDATE has... just like having two FROM clauses in UPDATE has. Along with the ability to overwrite the content of variables in the same query, both are, IMHO, huge advantages of SQL Server and SyBase over other DB engines.Monday, April 26, 2010 12:33 PM
-
You can also solve that problem this way:
select Id,Product=exp(sum(log(Value))) from @T group by Id
--Brad (My Blog)
Brad,could this lead to a wrong result?
Abdallah El-Chal, PMP, ITIL, MCTSMonday, April 26, 2010 12:59 PM -
Yes for negative and 0 values. Take a look at
http://forum.lessthandot.com/viewtopic.php?f=102&t=9930&start=0&st=0&sk=t&sd=a
where we discussed this exact topic as a programmer's puzzle.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blogMonday, April 26, 2010 1:17 PM -
Brad,
could this lead to a wrong result?
Abdallah El-Chal
See my follow up. if the data is zero or negative you will get error due to LOG has no value for zero or negative values.
Thanks Peso,Sorry I didn't follow the entire thread and yours was couple of posts down.
Sorry I didn't clarify. I didn't mean any numbers below 1 as you'll start getting negative logarithm beased on natural logarithm(below log(1)), but I meant on positive integers. Will this always work? Could it cast any number to the wrong integer?
Abdallah El-Chal, PMP, ITIL, MCTSMonday, April 26, 2010 1:39 PM -
. Will this always work? Could it cast any number to the wrong integer?
Always? Not likely. It has an impressive range though. Demo follows.-- Replace multiplication with sum of log DECLARE @T TABLE (Id INT, VALUE BIGINT) INSERT @T SELECT 1, 5 UNION ALL SELECT 1, 7 UNION ALL SELECT 2, 15 UNION ALL SELECT 3, 9 UNION ALL SELECT 3, 154 UNION ALL SELECT 3, 1 UNION ALL SELECT 3, 14 UNION ALL SELECT 4, 195 UNION ALL SELECT 4, 12 UNION ALL SELECT 5, 2000000 UNION ALL SELECT 5, 1000 UNION ALL SELECT 6, 2000000 UNION ALL SELECT 6, 20000 UNION ALL SELECT 7, 2000000 UNION ALL SELECT 7, 200000 /* Id Product 1 35 2 15 3 19404 4 2340 5 2000000000 6 40000000000 7 399999999999.999 */ select Id,Product=exp(sum(log(Value))) from @T group by Id
Kalman Toth, SQL Server & Business Intelligence Training; Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016- Edited by Kalman Toth Thursday, November 9, 2017 2:51 PM
Tuesday, May 11, 2010 1:14 AM -
Nice test. My code works correctly (based on Jeff's idea):
DECLARE @Product BIGINT SET @Product = 1 --drop table #OrderedProduct CREATE TABLE #OrderedProduct (RowID INT IDENTITY(1,1) PRIMARY KEY, ID INT, VALUE BIGINT, Product BIGINT, ROW INT) ;with Product AS (SELECT ID,VALUE, CAST(0 AS BIGINT) AS Product, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS ROW FROM @T) INSERT INTO #OrderedProduct SELECT * FROM Product ORDER BY ID, ROW UPDATE P SET @Product = P.Product = (CASE WHEN P.ROW = 1 THEN P.VALUE ELSE @Product END)* COALESCE(C.VALUE,1) FROM #OrderedProduct P LEFT JOIN #OrderedProduct C ON P.ID = C.ID and P.ROW < C.ROW --drop table #orderedproduct SELECT P.* FROM #OrderedProduct P WHERE P.RowID = (SELECT TOP 1 RowID FROM #OrderedProduct X WHERE P.ID =X.ID ORDER BY ROW DESC)
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blogTuesday, May 11, 2010 1:44 AM -
I never used the multiple-assignment operator anywhere except here. Does it have other uses?
Kalman Toth, SQL Server & Business Intelligence Training; SQLUSA.comAh... sorry. I didn't answer that question. Yes... in SQL Server 2000 I would use it for what a lot of folks use ROW_NUMBER, RANK, DENSE_RANK, and NTILE for. In my current line of work, I use it for managing the "bin" size of grouped files by file size. For example, we have hundreds of thousands of relatively small files that need to be transmitted in groups that must not exceed a certain group size and there's currently no windowing function to make such assignments by group. A "grouped" running total does this job quite nicely.
--Jeff ModenTuesday, May 11, 2010 12:41 PM -
Yes, the sentence as it's written clearly refers to the column's order.
Premature optimization is the root of all evil in programming. (c) by Donald Knuth
Naomi Nosonovsky, Sr. Programmer-Analyst
My blogSunday, August 1, 2010 2:19 PM -
Interesting blog on this topic by Andy Leonard
http://sqlblog.com/blogs/andy_leonard/archive/2011/03/08/t-sql-tuesday-aggregations-in-ssis.aspx
For every expert, there is an equal and opposite expert. - Becker's Law
Naomi Nosonovsky, Sr. Programmer-Analyst
My blogWednesday, March 9, 2011 3:08 AM