OVER PARTITION BY vs GROUP BY
-
Friday, May 11, 2012 7:13 PMModerator
Is the performance of GROUP BY query generally better? Thanks.
The following simple example shows sharp difference in performance.
-- GROUP BY - Relative cost 23% SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader GROUP BY VendorID ORDER BY OrderCount DESC, VendorID; -- OVER PARTITION BY - Relative cost 77% ;WITH CTE AS (SELECT VendorID, OrderDate, OrderCount = COUNT(*) OVER ( PARTITION BY VendorID), RN = ROW_NUMBER() OVER ( PARTITION BY VendorID ORDER BY OrderDate DESC) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader) SELECT VendorID, LastOrder=OrderDate, OrderCount FROM CTE WHERE RN = 1 ORDER BY OrderCount DESC, VendorID; ------------Related article: http://www.sqlusa.com/bestpractices2005/overpartitionby/
Kalman Toth SQL SERVER 2012 & BI TRAINING
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012- Changed Type Kalman TothMicrosoft Community Contributor, Moderator Friday, May 11, 2012 11:33 PM
- Edited by Kalman TothMicrosoft Community Contributor, Moderator Tuesday, October 16, 2012 11:36 PM
All Replies
-
Friday, May 11, 2012 7:19 PMModerator
I am not so sure that this is a fair comparison, but I will give it a look and get back to you. My concern is that the two different OVER clauses will cause two different activities.
EDIT:
My answer is that the performance of GROUP BY is indeed better if you ask questions that are biased to favor a GROUP BY solution rather than an OVER solution -- such as this question. I think the whole point of the OVER clause isn't that it is inherently better than GROUP BY; however, there are many questions that are biased such that the questions favor an OVER solution rather than a GROUP BY solution.
Which tool you choose will depend on the particular question and the bias that the question has toward a given tool.
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 7:34 PM
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 7:36 PM
-
Friday, May 11, 2012 7:22 PM
Looking at the query plan the difference seems obvious
Chuck
-
Friday, May 11, 2012 7:26 PMModerator
Hi Kent,
The result sets are identical:
-- Each query returns 86 rows WITH CTE AS (SELECT VendorID, OrderDate, OrderCount = COUNT(*) OVER ( PARTITION BY VendorID), RN = ROW_NUMBER() OVER ( PARTITION BY VendorID ORDER BY OrderDate DESC) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader) SELECT VendorID, LastOrder=OrderDate, OrderCount FROM CTE WHERE RN = 1 EXCEPT SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader GROUP BY VendorID -- (0 row(s) affected)
Kalman Toth SQL SERVER & BI TRAINING
- Edited by Kalman TothMicrosoft Community Contributor, Moderator Friday, May 11, 2012 7:27 PM
-
Friday, May 11, 2012 7:36 PMModerator
Yes, the results are the same; however, the activities of each query are different. This question is biased toward the GROUP BY solution. I can come up with a question that is biased toward an OVER solution if you want; however, I don't think that is necessary, is it?
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 7:39 PM
-
Friday, May 11, 2012 7:47 PMModerator
Hi Kent,
I don't think there is bias. The GROUP BY query is pretty mondane, done everyday. Now (starting with SQL Server 2005) it can be done with windowing functions. It is a fair comparison.
The underlying question, is it better to stick with GROUP BY if it can do the job?
Kalman Toth SQL SERVER & BI TRAINING
-
Friday, May 11, 2012 8:10 PMModerator
The underlying question, is it better to stick with GROUP BY if it can do the job?
That is the EXACT verbiage that I was hoping you would use, because the resounding answer to that question is NO. Realize that since you ask if it C A N do the job then if there is any counterexample to the question then the premise is false.
My statement is the same old "it depends." That is, it depends on which is more efficient or to some extent easier to understand / maintain.
EDIT:
A counter example:
-- -------------------------------------------------- -- Suppose that we are processing purchase orders -- by vendor. Also, suppose that we allow for -- multiple purchases to a particular vendor on any -- given day -- it is not what we prefer, but in -- rate circumstances we do allow it. Further -- -- What we want to produce is a list of the last -- purchase that we have for each vendor. We -- define the last purchase as the record associated -- with a vendorId that has the highest orderDate. -- In case there are multiple records with the same -- highest orderDate, we define the last purchase as -- the record with the highest OrderDate and the -- highest purchaseOrderId within the highest order -- Date. -- -- ( I am using AdventureWorks2012 ) -- -------------------------------------------------- with cte as ( select vendorId, purchaseOrderId, orderDate, row_Number() over ( partition by vendorId order by orderDate desc, purchaseOrderId desc ) as rn from purchasing.purchaseOrderHeader ) select vendorId, purchaseOrderId, orderDate from cte where rn = 1; --except select a.vendorId, max(purchaseOrderId) as purchaseOrderId, orderDate from purchasing.purchaseOrderHeader a join ( select vendorId, max(orderDate) as maxOrderDate from purchasing.purchaseOrderHeader b group by vendorId ) b on b.vendorId = a.vendorId and b.maxOrderDate = a.orderDate group by a.vendorId, orderDate ;
Note that this question is biased toward the OVER clause rather than the GROUP BY clause. Even though we CAN use the GROUP BY query, I probably wouldn't because the GROUP BY query requires a second sort.
What I would say is that GROUP BY queries should not be overlooked as options in favor of queries using an OVER clause. Moreover, I would say that it is not by accident that GROUP BY queries became available well before OVER queries. I have not reason to believe that OVER queries are more useful or get more frequent use than GROUP BY queries.My "Yes" part toward your original question is that yes, I think that if you get used to writing queries using the OVER clause that you could misuse the OVER clause in situations where a GROUP BY clause is a better choice.
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 8:34 PM
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 8:38 PM
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 8:38 PM
- Edited by Kent WaldropMicrosoft Community Contributor, Moderator Friday, May 11, 2012 8:57 PM
-
Friday, May 11, 2012 10:32 PM
-
Friday, May 11, 2012 11:38 PMModeratorSure it's better to use GROUP BY for that query you listed - the second query does twice work.
For every expert, there is an equal and opposite expert. - Becker's Law
My blog -
Friday, May 11, 2012 11:48 PMModerator
Hi Naomi,
You are right. The COUNT more than doubles the cost of the OVER PARTITION BY query, why it adds very little cost to the GROUP BY query:
-- GROUP BY - Relative cost 41% 0.093 SELECT VendorID, LastOrder=MAX(OrderDate) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader GROUP BY VendorID ORDER BY VendorID; -- OVER PARTITION BY - Relative cost 59% 0.131 ;WITH CTE AS (SELECT VendorID, OrderDate, RN = ROW_NUMBER() OVER ( PARTITION BY VendorID ORDER BY OrderDate DESC) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader) SELECT VendorID, LastOrder=OrderDate FROM CTE WHERE RN = 1 ORDER BY VendorID; ------------
Kalman Toth SQL SERVER & BI TRAINING
- Edited by Kalman TothMicrosoft Community Contributor, Moderator Friday, May 11, 2012 11:48 PM

