locked
Multiple Medians in SQL RRS feed

  • Question

  • Getting a median in SQL requires a bit of work. With a simple list, I know how to find the median using a top 50% and bottom 50%. However, I am stumped on multiple medians. I have a list of 25,000 records that I had to organize into groups and find their medians. When I put this list through my process, I have 582 sub groups. Some groups only have 1 or 2 records, some have 20 or more. I used a row_number() to partition and order over the groups, so I have them properly ranked in each group. I also have a new table ready to accept the medians, where 1 column is my 582 groups and the other is nulls for the medians, and I even autonumbered in case I need that

    Example of the type of information i currently have established:

    Group Value Rank

     1         10     1

     1         11      2

     1         25     3

     2          2     1

     3           7    1

     3           9    2

     3          13   3

     3          15   4

     4           5   1

     ... ... ...

    I want to get the median for each group so that my table would read

    Group Median

     1         11

     2          2

     3         11

     ... ...

    for all 582 groups. Thanks for the help!!!

    Monday, May 17, 2010 2:21 PM

Answers

  •  

    se-- Idea from Itzik Ben-Gan's book T-SQL Querying

    select
      CustomerID,
      avg(Freight) as medianFreight
    from (
      select
        CustomerID,
        Freight,
        row_number() over (partition by CustomerID order by Freight asc,
    OrderID asc) as r1,
        row_number() over (partition by CustomerID order by Freight desc,
    OrderID desc) as r2
      from Northwind..Orders
    ) T
    where r1 - r2 between -1 and 1
    group by CustomerID

    Tuesday, May 18, 2010 5:02 AM