none
Median of a measure

    Question

  • Hi All,

    I am trying to calculate Median using the method described by Marco Russo here.  However, the array of items in which to compute it is not a calculated column but a measure (hence i cannot use VALUES or EARLIER).  How can this be done?   Also, I tried to use the modification to the calculation proposed by Jonathan Roussel for Quartile calculations here.   Unfortunately, for both solutions (Median and Quartile), the results do not reflect Excel's results as no interpolation is used. 

    Do you guys know of any 'elegant' approach to computing Median and Quartile using DAX measures, and have interpolated results?

    Thanks!




    Javier Guillen
    http://www.msbicentral.com/Blogs/JavierGuillen.aspx
    Sunday, May 29, 2011 4:17 AM
    Answerer

Answers

  • Javier,

    I was not very clear. By "Iterator" i meant another column on which to iterate to get the values of the measure.

    A measure has only one value. It has more than one value when computed against different filter contexts. For example, if the measure is SUM([SalesAmount]), to get more than one value you need to iterate over customers so that you have a different value for each customer. Then, you can compute the median of this array.

    It is not very easy (and I don't like the solution), but I am writing these formulas to give you some ideas that you can further develop to make them fit your needs (it they can fit your needs, of course).

    I have started with a simple table with Customer and Value. Then, Measure1 is the measure of which you want to compute the median and Customer is the "iterator". I defined a "SortIndex" measure which gives, for each customer, the sort order of the measure:

    IF (
        COUNTROWS (VALUES (Tabella2[Customer])) = 1,
        COUNTROWS (
            FILTER (
                ALL (Tabella2[Customer]),
                [Measure 1] <= CALCULATE (
                    [Measure 1], Tabella2[Customer] = VALUES (Tabella2[Customer])
                )
            )
        )
    )

    With this value, it is easy to get the median and/or perform any other computation since you can use ALL to extend the filter context and then leverage this index:

    =CALCULATE (
        [Measure 1],
        FILTER (
            ALL (Tabella2[Customer]),
            [SortIndex] = ROUND (COUNTROWS (ALL (Tabella2[Customer])) / 2, 0)
        )
    )

    Performance should not be very good... Moreover, note that the need to use ALL in more than one place makes this measure not very usable with a PivotTable that uses different filters. Finally, to make things worse, this formula works with customers but if you put any other column on the PivotTable, it will not compute correct values.

    Nevertheless, I thought that sharing the idea might help you in finding a better solution. :)

     


    Alberto Ferrari
    http://www.powerpivotworkshop.com
    Tuesday, May 31, 2011 9:06 AM

All replies

  • Javier,

    There's something I am missing in your description. In order to compute the median you should have an "iterator" column, are you going to use whatever the user puts on the rows or columns to compute the median? If the answer is "yes", then I am afraid it cannot be computed, because you don't have access to what the user put on the row/column inside the filter context. Otherwise, if your iterator is a defined one, then the formula is straightforward.

    Let me know...

     


    Alberto Ferrari
    http://www.powerpivotworkshop.com
    Sunday, May 29, 2011 7:19 PM
  • Hi Alberto,

    The iterator column would be the distinct values which are the output of a DAX measure.  In Marco's example, the iterator is a table column which works well as he uses FILTER(VALUES([column])....   but in my case, the calculation should iterate through the output values of a measure, and unfortunately VALUES and EARLIER only take colums as parameters.. not measures. So yes, it is defined (as an expression).  I need something that would act like:

    MINX(FILTER(VALUES([Measure]),
    CALCULATE(COUNTROWS(MyDataSerie1),
    [Measure] <= EARLIER([Measure]) )
    > COUNTROWS(MyDataSerie1) / 2),
    [Measure])

    To complicate things a bit, the measure is defined in another table, so I would have to use RELATED or RELATEDTABLE.  This is not a requirement, and I can change the data model if needed.

    And last, there is the issue of interpolation.   Using Marco's formula over two rows only,  one with [age] 1 and the other one with [age] 2, the output of the DAX calculation is 2.    This is incorrect, as the result should be an interpolation that yields 1.5  (the proposed DAX calculation is incorrect in this case as it relies only on VALUES, and it gives the MIN result of that distinct list.  In many cases, though, the resulting output won't be on the list)

    I have not been able to come up with an expression that outputs interpolated values that mimic Excel's MEDIAN or QUARTILE

    Hope that makes more sense :)

     




    Javier Guillen
    http://www.msbicentral.com/Blogs/JavierGuillen.aspx





    Monday, May 30, 2011 12:09 AM
    Answerer
  • Javier,

    I was not very clear. By "Iterator" i meant another column on which to iterate to get the values of the measure.

    A measure has only one value. It has more than one value when computed against different filter contexts. For example, if the measure is SUM([SalesAmount]), to get more than one value you need to iterate over customers so that you have a different value for each customer. Then, you can compute the median of this array.

    It is not very easy (and I don't like the solution), but I am writing these formulas to give you some ideas that you can further develop to make them fit your needs (it they can fit your needs, of course).

    I have started with a simple table with Customer and Value. Then, Measure1 is the measure of which you want to compute the median and Customer is the "iterator". I defined a "SortIndex" measure which gives, for each customer, the sort order of the measure:

    IF (
        COUNTROWS (VALUES (Tabella2[Customer])) = 1,
        COUNTROWS (
            FILTER (
                ALL (Tabella2[Customer]),
                [Measure 1] <= CALCULATE (
                    [Measure 1], Tabella2[Customer] = VALUES (Tabella2[Customer])
                )
            )
        )
    )

    With this value, it is easy to get the median and/or perform any other computation since you can use ALL to extend the filter context and then leverage this index:

    =CALCULATE (
        [Measure 1],
        FILTER (
            ALL (Tabella2[Customer]),
            [SortIndex] = ROUND (COUNTROWS (ALL (Tabella2[Customer])) / 2, 0)
        )
    )

    Performance should not be very good... Moreover, note that the need to use ALL in more than one place makes this measure not very usable with a PivotTable that uses different filters. Finally, to make things worse, this formula works with customers but if you put any other column on the PivotTable, it will not compute correct values.

    Nevertheless, I thought that sharing the idea might help you in finding a better solution. :)

     


    Alberto Ferrari
    http://www.powerpivotworkshop.com
    Tuesday, May 31, 2011 9:06 AM
  • Hi Alberto,

    Thanks so much for getting me started with the formula.  I added two enhancements to it, and now I'm matching Excel regular 'Median' function:

    1) It is necessary to first determine if the COUNTROWS of the iterator is even or odd.   In the case of odd sets, the median should be obvious (where the index is equal to half of COUNTROWS).  In the case of even sets, though, we must get the two numbers in the middle and average them.

    2) I noticed that for equal measure outputs (in the context of the iterator), the index can be repeated.  For example, in the index below based on TestMeasure (with repeated values in different Entity iterators):

    Image and video hosting by TinyPic

    So if we get the two numbers in the middle for even sets based on a greater than or equal to half of COUNTROWS, the output would be invalid.  In other words, in the example above COUNTROWS / 2 is equal to 5.   If for the even set, I get the first number to average based on this condition:

    CALCULATE (
        [TestMeasure],
        FILTER (
            ALL (Tabella2[Customer]),
            [SortIndex] = ROUND (COUNTROWS (ALL (Tabella2[Customer])) / 2, 0)
        )
    )

    then we would get the value of 5 which is present in the index column.  However, this would add up all the values of index 5, and the first number to average in even sets would be incorrectly 9, which is 3 * 3 (the values with index 5). 

    Instead, we want to get the MAX value of the first half and the MIN value of the second half (for even sets).  I implemented this in the formula as:  

     

    MAXX(Facts, [TestMeasure])

    The resulting formula is:

     

    IF(COUNTROWS(VALUES(Facts[NAME])) > 1,
     IF(MOD(COUNTROWS(ALL(Facts[NAME])),2) = 0,
      (CALCULATE(
       MAXX(Facts, [TestMeasure]),
        FILTER(
         ALL(Facts[Name]),
         [Index] <= ROUND( COUNTROWS(ALL(Facts[NAME])) / 2, 0)
        )
      ) +
        CALCULATE(
       MINX(Facts, [TestMeasure]),
        FILTER(
         ALL(Facts[Name]),
         [Index] > ROUND( COUNTROWS(ALL(Facts[NAME])) / 2, 0)
        )
      )) / 2
      ,
      CALCULATE(
       [TestMeasure],
        FILTER(
         ALL(Facts[Name]),
         [Index] = ROUND( COUNTROWS(ALL(Facts[NAME])) / 2, 0)
        )
      )
     )
    )

     It computes properly even if other columns are added to the pivot table.   Even though ALL is used to remove filter context on one column (Facts[NAME]) , other filters are still respected and that works for me.  In the case above, the Median breaks properly by date.  When I remove Facts[Name] altogether, the total is properly reflected for the date even though the iterator is not shown.

     If I could only create a QUARTILE or PERCENTILE calculation now... (one that properly implements interpolation, like the one built-in to Excel...)





    Javier Guillen
    http://www.msbicentral.com/Blogs/JavierGuillen.aspx

     



    Thursday, June 02, 2011 2:38 AM
    Answerer
  • Wonderful, this looks like a very elegant solution.

    Glad to have helped with my thoughts, this is a great example of team working. :)

     


    Alberto Ferrari
    http://www.powerpivotworkshop.com
    Thursday, June 02, 2011 7:42 PM
  • Yes, good team work!   Thank you so much Alberto :)


    Javier Guillen
    http://www.msbicentral.com/Blogs/JavierGuillen.aspx
    Friday, June 03, 2011 11:31 AM
    Answerer
  • Hello Javier,

    Thanks so much for the work you and Alberto did on creating a solution for creating a Median value in DAX.  It works for me, but only up until I try to compute the median value across a set of 2001 values.  Up to 2000, I receive results.  From 2001 and greater number of values, the function returns nothing.  Not an error, just no value.

    The number 2000 seems like a rather purposefully selective number.  Do you or anyone else know of any limitations set by design or in settings somewhere that would cause this behavior?  I was wondering if you had experienced this same limitation?

    Thanks,

    Jared Watson

    Friday, October 21, 2011 7:33 PM