none
OVER PARTITION BY vs GROUP BY

Toutes les réponses

  • I am not so sure that this is a fair comparison, but I will give it a look and get back to you.  My concern is that the two different OVER clauses will cause two different activities.

    EDIT:

    My answer is that the performance of GROUP BY is indeed better if you ask questions that are biased to favor a GROUP BY solution rather than an OVER solution -- such as this question.  I think the whole point of the OVER clause isn't that it is inherently better than GROUP BY; however, there are many questions that are biased such that the questions favor an OVER solution rather than a GROUP BY solution.

    Which tool you choose will depend on the particular question and the bias that the question has toward a given tool.


    vendredi 11 mai 2012 19:19
  • Looking at the query plan the difference seems obvious


    Chuck

    vendredi 11 mai 2012 19:22
  • Hi Kent,

    The result sets are identical:

    -- Each query returns 86 rows
    WITH CTE AS (SELECT VendorID, OrderDate,
           OrderCount = COUNT(*)  OVER ( PARTITION BY VendorID),
           RN = ROW_NUMBER() OVER ( PARTITION BY VendorID 
                             ORDER BY OrderDate DESC)
    FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader)
    SELECT VendorID, LastOrder=OrderDate, OrderCount 
    FROM CTE WHERE RN = 1
    EXCEPT
    SELECT VendorID, LastOrder=MAX(OrderDate), OrderCount=COUNT(*)
    FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader
    GROUP BY VendorID
    -- (0 row(s) affected)


    Kalman Toth SQL SERVER & BI TRAINING


    vendredi 11 mai 2012 19:26
  • Yes, the results are the same; however, the activities of each query are different.  This question is biased toward the GROUP BY solution.  I can come up with a question that is biased toward an OVER solution if you want; however, I don't think that is necessary, is it?

    vendredi 11 mai 2012 19:36
  • Hi Kent,

    I don't think there is bias. The GROUP BY query is pretty mondane, done everyday. Now (starting with SQL Server 2005) it can be done with windowing functions. It is a fair comparison.

    The underlying question, is it better to stick with GROUP BY if it can do the job?


    Kalman Toth SQL SERVER & BI TRAINING

    vendredi 11 mai 2012 19:47
  • The underlying question, is it better to stick with GROUP BY if it can do the job?

    That is the EXACT verbiage that I was hoping you would use, because the resounding answer to that question is NO.  Realize that since you ask if it   C A N   do the job then if there is any counterexample to the question then the premise is false.

    My statement is the same old "it depends."  That is, it depends on which is more efficient or to some extent easier to understand / maintain.

    EDIT:

    A counter example:

    -- --------------------------------------------------
    --  Suppose that we are processing purchase orders
    --  by vendor. Also, suppose that we allow for 
    --  multiple purchases to a particular vendor on any
    --  given day -- it is not what we prefer, but in
    --  rate circumstances we do allow it.  Further
    --
    --  What we want to produce is a list of the last
    --  purchase that we have for each vendor.  We
    --  define the last purchase as the record associated
    --  with a vendorId that has the highest orderDate.
    --  In case there are multiple records with the same
    --  highest orderDate, we define the last purchase as
    --  the record with the highest OrderDate and the
    --  highest purchaseOrderId within the highest order
    --  Date.
    --
    --  ( I am using AdventureWorks2012 )
    -- --------------------------------------------------
    with cte as
    ( select
        vendorId,
    	purchaseOrderId,
        orderDate,
    	row_Number() over
    	( partition by vendorId
    	  order by orderDate desc, purchaseOrderId desc
    	) as rn
      from purchasing.purchaseOrderHeader
    )
    select
      vendorId,
      purchaseOrderId,
      orderDate
    from cte
    where rn = 1;
    --except
    select
      a.vendorId,
      max(purchaseOrderId) as purchaseOrderId,
      orderDate
    from purchasing.purchaseOrderHeader a
    join
    ( select 
        vendorId,
        max(orderDate) as maxOrderDate
      from purchasing.purchaseOrderHeader b
      group by vendorId
    ) b
      on b.vendorId = a.vendorId
     and b.maxOrderDate = a.orderDate
     group by
       a.vendorId,
       orderDate
     ;

    Note that this question is biased toward the OVER clause rather than the GROUP BY clause.  Even though we CAN use the GROUP BY query, I probably wouldn't because the GROUP BY query requires a second sort.

    What I would say is that GROUP BY queries should not be overlooked as options in favor of queries using an OVER clause.  Moreover, I would say that it is not by accident that GROUP BY queries became available well before OVER queries.  I have not reason to believe that OVER queries are more useful or get more frequent use than GROUP BY queries.

    My "Yes" part toward your original question is that yes, I think that if you get used to writing queries using the OVER clause that you could misuse the OVER clause in situations where a GROUP BY clause is a better choice.

    vendredi 11 mai 2012 20:10
  • Which medicine is better, Loseec or Aspirine?

    That is, the question is not meaningful.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    vendredi 11 mai 2012 22:32
  • Sure it's better to use GROUP BY for that query you listed - the second query does twice work.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    vendredi 11 mai 2012 23:38
  • Hi Naomi,

    You are right. The COUNT more than doubles the cost of the OVER PARTITION BY query, why it adds very little cost to the GROUP BY query:

    -- GROUP BY - Relative cost 41% 0.093
    SELECT VendorID, LastOrder=MAX(OrderDate)
    FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader
    GROUP BY VendorID
    ORDER BY VendorID;
    
    -- OVER PARTITION BY - Relative cost 59% 0.131
    ;WITH CTE AS (SELECT VendorID, OrderDate,
          RN = ROW_NUMBER() OVER ( PARTITION BY VendorID 
                             ORDER BY OrderDate DESC)
    FROM AdventureWorks2008.Purchasing.PurchaseOrderHeader)
    SELECT VendorID, LastOrder=OrderDate
    FROM CTE WHERE RN = 1
    ORDER BY VendorID;
    ------------


    Kalman Toth SQL SERVER & BI TRAINING


    vendredi 11 mai 2012 23:48