# OVER PARTITION BY vs GROUP BY

### การสนทนาทั่วไป

• Is the performance of GROUP BY query generally better? Thanks.

The following simple example shows sharp difference in performance.

```-- GROUP BY - Relative cost 23%
SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*)
GROUP BY VendorID
ORDER BY OrderCount DESC, VendorID;

-- OVER PARTITION BY - Relative cost 77%
;WITH CTE AS (SELECT VendorID, OrderDate,
OrderCount = COUNT(*)  OVER ( PARTITION BY VendorID),
RN = ROW_NUMBER() OVER ( PARTITION BY VendorID
ORDER BY OrderDate DESC)
SELECT VendorID, LastOrder=OrderDate, OrderCount
FROM CTE WHERE RN = 1
ORDER BY OrderCount DESC, VendorID;
------------```
Related article: http://www.sqlusa.com/bestpractices2005/overpartitionby/

11 พฤษภาคม 2555 19:13

### ตอบทั้งหมด

• I am not so sure that this is a fair comparison, but I will give it a look and get back to you.  My concern is that the two different OVER clauses will cause two different activities.

EDIT:

My answer is that the performance of GROUP BY is indeed better if you ask questions that are biased to favor a GROUP BY solution rather than an OVER solution -- such as this question.  I think the whole point of the OVER clause isn't that it is inherently better than GROUP BY; however, there are many questions that are biased such that the questions favor an OVER solution rather than a GROUP BY solution.

Which tool you choose will depend on the particular question and the bias that the question has toward a given tool.

11 พฤษภาคม 2555 19:19
• Looking at the query plan the difference seems obvious

Chuck

11 พฤษภาคม 2555 19:22
• Hi Kent,

The result sets are identical:

```-- Each query returns 86 rows
WITH CTE AS (SELECT VendorID, OrderDate,
OrderCount = COUNT(*)  OVER ( PARTITION BY VendorID),
RN = ROW_NUMBER() OVER ( PARTITION BY VendorID
ORDER BY OrderDate DESC)
SELECT VendorID, LastOrder=OrderDate, OrderCount
FROM CTE WHERE RN = 1
EXCEPT
SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*)
GROUP BY VendorID
-- (0 row(s) affected)```

Kalman Toth SQL SERVER & BI TRAINING

11 พฤษภาคม 2555 19:26
• Yes, the results are the same; however, the activities of each query are different.  This question is biased toward the GROUP BY solution.  I can come up with a question that is biased toward an OVER solution if you want; however, I don't think that is necessary, is it?

11 พฤษภาคม 2555 19:36
• Hi Kent,

I don't think there is bias. The GROUP BY query is pretty mondane, done everyday. Now (starting with SQL Server 2005) it can be done with windowing functions. It is a fair comparison.

The underlying question, is it better to stick with GROUP BY if it can do the job?

Kalman Toth SQL SERVER & BI TRAINING

11 พฤษภาคม 2555 19:47
• The underlying question, is it better to stick with GROUP BY if it can do the job?

That is the EXACT verbiage that I was hoping you would use, because the resounding answer to that question is NO.  Realize that since you ask if it   C A N   do the job then if there is any counterexample to the question then the premise is false.

My statement is the same old "it depends."  That is, it depends on which is more efficient or to some extent easier to understand / maintain.

EDIT:

A counter example:

```-- --------------------------------------------------
--  Suppose that we are processing purchase orders
--  by vendor. Also, suppose that we allow for
--  multiple purchases to a particular vendor on any
--  given day -- it is not what we prefer, but in
--  rate circumstances we do allow it.  Further
--
--  What we want to produce is a list of the last
--  purchase that we have for each vendor.  We
--  define the last purchase as the record associated
--  with a vendorId that has the highest orderDate.
--  In case there are multiple records with the same
--  highest orderDate, we define the last purchase as
--  the record with the highest OrderDate and the
--  highest purchaseOrderId within the highest order
--  Date.
--
--  ( I am using AdventureWorks2012 )
-- --------------------------------------------------
with cte as
( select
vendorId,
purchaseOrderId,
orderDate,
row_Number() over
( partition by vendorId
order by orderDate desc, purchaseOrderId desc
) as rn
)
select
vendorId,
purchaseOrderId,
orderDate
from cte
where rn = 1;
--except
select
a.vendorId,
max(purchaseOrderId) as purchaseOrderId,
orderDate
join
( select
vendorId,
max(orderDate) as maxOrderDate
group by vendorId
) b
on b.vendorId = a.vendorId
and b.maxOrderDate = a.orderDate
group by
a.vendorId,
orderDate
;```

Note that this question is biased toward the OVER clause rather than the GROUP BY clause.  Even though we CAN use the GROUP BY query, I probably wouldn't because the GROUP BY query requires a second sort.

What I would say is that GROUP BY queries should not be overlooked as options in favor of queries using an OVER clause.  Moreover, I would say that it is not by accident that GROUP BY queries became available well before OVER queries.  I have not reason to believe that OVER queries are more useful or get more frequent use than GROUP BY queries.

My "Yes" part toward your original question is that yes, I think that if you get used to writing queries using the OVER clause that you could misuse the OVER clause in situations where a GROUP BY clause is a better choice.

11 พฤษภาคม 2555 20:10
• Which medicine is better, Loseec or Aspirine?

That is, the question is not meaningful.

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
11 พฤษภาคม 2555 22:32
• Sure it's better to use GROUP BY for that query you listed - the second query does twice work.

For every expert, there is an equal and opposite expert. - Becker's Law

My blog

11 พฤษภาคม 2555 23:38
• Hi Naomi,

You are right. The COUNT more than doubles the cost of the OVER PARTITION BY query, why it adds very little cost to the GROUP BY query:

```-- GROUP BY - Relative cost 41% 0.093
SELECT VendorID, LastOrder=MAX(OrderDate)
GROUP BY VendorID
ORDER BY VendorID;

-- OVER PARTITION BY - Relative cost 59% 0.131
;WITH CTE AS (SELECT VendorID, OrderDate,
RN = ROW_NUMBER() OVER ( PARTITION BY VendorID
ORDER BY OrderDate DESC)