OVER PARTITION BY vs GROUP BY
-
venerdì 11 maggio 2012 19:13Moderatore
Is the performance of GROUP BY query generally better? Thanks.
The following simple example shows sharp difference in performance.
-- GROUP BY - Relative cost 23% SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader GROUP BY VendorID ORDER BY OrderCount DESC, VendorID; -- OVER PARTITION BY - Relative cost 77% ;WITH CTE AS (SELECT VendorID, OrderDate, OrderCount = COUNT(*) OVER ( PARTITION BY VendorID), RN = ROW_NUMBER() OVER ( PARTITION BY VendorID ORDER BY OrderDate DESC) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader) SELECT VendorID, LastOrder=OrderDate, OrderCount FROM CTE WHERE RN = 1 ORDER BY OrderCount DESC, VendorID; ------------Related article: http://www.sqlusa.com/bestpractices2005/overpartitionby/
Kalman Toth SQL SERVER 2012 & BI TRAINING
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012- Tipo modificato Kalman TothMicrosoft Community Contributor, Moderator venerdì 11 maggio 2012 23:33
- Modificato Kalman TothMicrosoft Community Contributor, Moderator martedì 16 ottobre 2012 23:36
Tutte le risposte
-
venerdì 11 maggio 2012 19:19Moderatore
I am not so sure that this is a fair comparison, but I will give it a look and get back to you. My concern is that the two different OVER clauses will cause two different activities.
EDIT:
My answer is that the performance of GROUP BY is indeed better if you ask questions that are biased to favor a GROUP BY solution rather than an OVER solution -- such as this question. I think the whole point of the OVER clause isn't that it is inherently better than GROUP BY; however, there are many questions that are biased such that the questions favor an OVER solution rather than a GROUP BY solution.
Which tool you choose will depend on the particular question and the bias that the question has toward a given tool.
- Modificato Kent WaldropMicrosoft Community Contributor, Moderator venerdì 11 maggio 2012 19:36
-
venerdì 11 maggio 2012 19:22
Looking at the query plan the difference seems obvious
Chuck
-
venerdì 11 maggio 2012 19:26Moderatore
Hi Kent,
The result sets are identical:
-- Each query returns 86 rows WITH CTE AS (SELECT VendorID, OrderDate, OrderCount = COUNT(*) OVER ( PARTITION BY VendorID), RN = ROW_NUMBER() OVER ( PARTITION BY VendorID ORDER BY OrderDate DESC) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader) SELECT VendorID, LastOrder=OrderDate, OrderCount FROM CTE WHERE RN = 1 EXCEPT SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader GROUP BY VendorID -- (0 row(s) affected)
Kalman Toth SQL SERVER & BI TRAINING
- Modificato Kalman TothMicrosoft Community Contributor, Moderator venerdì 11 maggio 2012 19:27
-
venerdì 11 maggio 2012 19:36Moderatore
Yes, the results are the same; however, the activities of each query are different. This question is biased toward the GROUP BY solution. I can come up with a question that is biased toward an OVER solution if you want; however, I don't think that is necessary, is it?
- Modificato Kent WaldropMicrosoft Community Contributor, Moderator venerdì 11 maggio 2012 19:39
-
venerdì 11 maggio 2012 19:47Moderatore
Hi Kent,
I don't think there is bias. The GROUP BY query is pretty mondane, done everyday. Now (starting with SQL Server 2005) it can be done with windowing functions. It is a fair comparison.
The underlying question, is it better to stick with GROUP BY if it can do the job?
Kalman Toth SQL SERVER & BI TRAINING
-
venerdì 11 maggio 2012 20:10Moderatore
The underlying question, is it better to stick with GROUP BY if it can do the job?
That is the EXACT verbiage that I was hoping you would use, because the resounding answer to that question is NO. Realize that since you ask if it C A N do the job then if there is any counterexample to the question then the premise is false.
My statement is the same old "it depends." That is, it depends on which is more efficient or to some extent easier to understand / maintain.
EDIT:
A counter example:
-- -------------------------------------------------- -- Suppose that we are processing purchase orders -- by vendor. Also, suppose that we allow for -- multiple purchases to a particular vendor on any -- given day -- it is not what we prefer, but in -- rate circumstances we do allow it. Further -- -- What we want to produce is a list of the last -- purchase that we have for each vendor. We -- define the last purchase as the record associated -- with a vendorId that has the highest orderDate. -- In case there are multiple records with the same -- highest orderDate, we define the last purchase as -- the record with the highest OrderDate and the -- highest purchaseOrderId within the highest order -- Date. -- -- ( I am using AdventureWorks2012 ) -- -------------------------------------------------- with cte as ( select vendorId, purchaseOrderId, orderDate, row_Number() over ( partition by vendorId order by orderDate desc, purchaseOrderId desc ) as rn from purchasing.purchaseOrderHeader ) select vendorId, purchaseOrderId, orderDate from cte where rn = 1; --except select a.vendorId, max(purchaseOrderId) as purchaseOrderId, orderDate from purchasing.purchaseOrderHeader a join ( select vendorId, max(orderDate) as maxOrderDate from purchasing.purchaseOrderHeader b group by vendorId ) b on b.vendorId = a.vendorId and b.maxOrderDate = a.orderDate group by a.vendorId, orderDate ;
Note that this question is biased toward the OVER clause rather than the GROUP BY clause. Even though we CAN use the GROUP BY query, I probably wouldn't because the GROUP BY query requires a second sort.
What I would say is that GROUP BY queries should not be overlooked as options in favor of queries using an OVER clause. Moreover, I would say that it is not by accident that GROUP BY queries became available well before OVER queries. I have not reason to believe that OVER queries are more useful or get more frequent use than GROUP BY queries.My "Yes" part toward your original question is that yes, I think that if you get used to writing queries using the OVER clause that you could misuse the OVER clause in situations where a GROUP BY clause is a better choice.
- Modificato Kent WaldropMicrosoft Community Contributor, Moderator venerdì 11 maggio 2012 20:57
-
venerdì 11 maggio 2012 22:32
-
venerdì 11 maggio 2012 23:38ModeratoreSure it's better to use GROUP BY for that query you listed - the second query does twice work.
For every expert, there is an equal and opposite expert. - Becker's Law
My blog -
venerdì 11 maggio 2012 23:48Moderatore
Hi Naomi,
You are right. The COUNT more than doubles the cost of the OVER PARTITION BY query, why it adds very little cost to the GROUP BY query:
-- GROUP BY - Relative cost 41% 0.093 SELECT VendorID, LastOrder=MAX(OrderDate) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader GROUP BY VendorID ORDER BY VendorID; -- OVER PARTITION BY - Relative cost 59% 0.131 ;WITH CTE AS (SELECT VendorID, OrderDate, RN = ROW_NUMBER() OVER ( PARTITION BY VendorID ORDER BY OrderDate DESC) FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader) SELECT VendorID, LastOrder=OrderDate FROM CTE WHERE RN = 1 ORDER BY VendorID; ------------
Kalman Toth SQL SERVER & BI TRAINING
- Modificato Kalman TothMicrosoft Community Contributor, Moderator venerdì 11 maggio 2012 23:48

