locked
Why do these two measures give different results? RRS feed

  • Question

  • I can't undertand why these two measures give different results:

    count1
    = countrows (filter (allexcept (test, test [b]), [a] = 1))
    
    count2
    = calculate (
    	countrows (test),
    	filter (allexcept (test, test [b]), [a] = 1))
    

    Consider this test file:

    a	b
    1	x
    1	y
    2	x
    2	y
    

    Making a pivot table on that data, with "b" in the columns and "count1" and "count2" in the values, yields:

    I've poured over "PowerPivot for Excel 2010" for a couple of hours and can't understand why they're different.

     

    Saturday, September 3, 2011 11:48 PM

Answers

  • John,

    you're right, I misread your initial message and only now realized the strange behavior of the ALLEXCEPT function. And you're right, this particular explanation is not present in the book!

    I will ask for a clarification, it seems to me that the reason is that ALLEXCEPT has been designed to modify the filter context in a CALCULATE and its use as an argument for an iterative function (like FILTER, SUMX, ...) is not a good idea.

    The documentation related to ALLEXCEPT says that the return value is "A table with all filters removed except for the filters on the specified columns." - You should imagine that ALLEXCEPT returns a table made by all the columns except those you listed in the other ALLEXCEPT parameters. This table might contain duplicates. Thus, when you write:

    allexcept (test, test [b]),

    What you get as a result is a table that contains:

    a
    1
    1
    2
    2

    You have 4 rows. You should eliminate duplicate values in order to get the number of distinct values.

    I hope this helps. I had to make some test and checking with Denali you can test the result of ALLEXCEPT by testing it with EVALUATE (in SQL Server Management Studio) - sorry for wrong assumptions I made in my first answer, I should have looked at your message with more attention.

    Let me know if it is clear now.

    Marco


    Marco Russo http://www.powerpivotworkshop.com http://sqlblog.com/blogs/marco_russo
    • Marked as answer by Challen Fu Monday, September 12, 2011 8:29 AM
    Tuesday, September 6, 2011 9:37 AM

All replies

  • Your COUNT2 measure uses CALCULATE, which applies the current filter context (your selection on columns) to the COUNTROWS(test) expression, intersecting that filter with the FILTER you explicitly passed as a parameter. Thus, the resulting filter is the intersection of these two filters, one explicit and the other implicit.

    Your COUNT1 measure simply ignores the existing filter context. For this reason you see two different results.

    Further readings:
    http://sqlblog.com/blogs/marco_russo/archive/2010/01/03/how-calculate-works-in-dax.aspx
    Chapter 6 of http://www.amazon.com/dp/0735640580/?tag=se04-20

    Marco Russo
    http://sqlblog.com/blogs/marco_russo
    http://www.powerpivotworkshop.com

     


    Marco Russo http://www.powerpivotworkshop.com http://sqlblog.com/blogs/marco_russo
    Monday, September 5, 2011 7:24 AM
  • You wrote, “Your COUNT1 measure simply ignores the existing filter context.”. 

     

    That’s precisely where I’m getting stuck.  Here's how I understand the step-by-step evaluation of count1 for the leftmost cell of the pivot table, using chapter 6 and the reference in the appendix. At which step am I going wrong?

     

    1. The leftmost cell of the pivot table has an implicit context filter test [b] = "x".

     

    2. allexcept (test, test [b])

     

    This removes all context filters for the columns of test except column b.  So the context filter test [b] = "x" remains in the expression's result. There are two such rows matching that filter.

     

    3. filter (allexcept (test, test [b]), [a] = 1)

     

    This takes the result from step 2, all the rows of test where test [b] = "x", and further filters with the constraint test [a] = 1.  Thus the result is all rows of test where test [b] = "x" and test [a] = 1.  This is exactly one such row.

     

    4. countrows (filter (allexcept (test, test [b]), [a] = 1)

     

    This counts the rows from step 3, returning 1.   (Except that the answer is really 2!)

     

    Thanks very much for your help.

     

    [By the way, the URL you posted gives 403 - Forbidden: http://sqlblog.com/blogs/marco_russo/archive/2010/01/03/how-calculate-works-in-dax.aspx.]

    Monday, September 5, 2011 6:21 PM
  • Hi Marco and John! :)

    This is an interesting one.  If you simplify your example even further, you will discover that it has to do with the use of ALLEXCEPT() without a CALCULATE() function. 

    Try adding columns [a] and [b] to the as a slicer (in addition to what you described above).  Select the value "x" for column [b], and "1" for column [a].  Now try this:

    COUNTROWS(ALLEXCEPT(Test,Test[a]))

    You get a value of 4 - which is interesting, as you are clearing any filters in column [b] BUT you kept them on [a], for which you have actively selected one value.

    However, if your try this:

    CALCULATE(COUNTROWS(Test), ALLEXCEPT(Test,Test[b]))

    The result is 2 (correct output)

    Using CALCULATE ensures the computation is done in the current context but modified through the ALLEXCEPT function.

    A similar interesting behavior can be observed with another calculation.  Remove [b] from Column Labels and leaving the existing slicer selections, try

    COUNTROWS(ALL(Test[a]))


    This should yield the same result as ALLEXCEPT(Test,Test[b]) - But this time the value is 2, which appears to be correct.   However, if you select "x" and "y" on column [b] in the slicer, the output is still 2. Shouldn't it be 4, as the intersection of the explicitly selected values on [b] with ALL() values on [a] is 4?

    using CALCULATE, you do get the correct value:

    CALCULATE(COUNTROWS(Test), ALL(Test[a]))

    At this moment I actually dont have a good explanation for this behavior, other than the fact that ALL() functions are normally meant to be used as 'setFilter' parameters when CALCULATE applies blocking semantics in order to derive calculation context. Using them without CALCULATE forces context to ignore query context (slicer selection) which is not always intuitive. 




    Javier Guillen
    http://www.msbicentral.com/Blogs/JavierGuillen.aspx
    Monday, September 5, 2011 7:26 PM
    Answerer
  • Using [ALL() functions] without CALCULATE forces context to ignore query context (slicer selection) which is not always intuitive.

    That is a concise summary of the observed behavior, thanks. But can you point me to some documentation explaining this?  I just re-read chapter 6 of "Microsoft PowerPivot for Excel 2010" for the nth time and don't see any clue about it.  Also, the MSDN Data Analysis Expressions (DAX) Reference doesn't appear to explain this. 

    And if this is indeed the intended semantics, I agree it is not intuitive: Why does FILTER, when used in an aggregate function, retain the query context while ALLEXCEPT does not? 

     

     


    Tuesday, September 6, 2011 12:11 AM
  • John,

    you're right, I misread your initial message and only now realized the strange behavior of the ALLEXCEPT function. And you're right, this particular explanation is not present in the book!

    I will ask for a clarification, it seems to me that the reason is that ALLEXCEPT has been designed to modify the filter context in a CALCULATE and its use as an argument for an iterative function (like FILTER, SUMX, ...) is not a good idea.

    The documentation related to ALLEXCEPT says that the return value is "A table with all filters removed except for the filters on the specified columns." - You should imagine that ALLEXCEPT returns a table made by all the columns except those you listed in the other ALLEXCEPT parameters. This table might contain duplicates. Thus, when you write:

    allexcept (test, test [b]),

    What you get as a result is a table that contains:

    a
    1
    1
    2
    2

    You have 4 rows. You should eliminate duplicate values in order to get the number of distinct values.

    I hope this helps. I had to make some test and checking with Denali you can test the result of ALLEXCEPT by testing it with EVALUATE (in SQL Server Management Studio) - sorry for wrong assumptions I made in my first answer, I should have looked at your message with more attention.

    Let me know if it is clear now.

    Marco


    Marco Russo http://www.powerpivotworkshop.com http://sqlblog.com/blogs/marco_russo
    • Marked as answer by Challen Fu Monday, September 12, 2011 8:29 AM
    Tuesday, September 6, 2011 9:37 AM
  • Marco and Javier,

    Thanks very much in helping me to understand the actual behavior of ALLEXCEPT.  I look forward to any additional information about the design motivations.  It does seem counter-intuitive that FILTER inside of an aggregate like SUMX uses the query context, while ALLEXCEPT ignores the query context.   But at least this issue forced me to really understand Chapter 6!

    I attempted to use ALLEXCEPT inside an aggregate to compute a complicated comparison measure that required nesting of SUMX and MINX and the use of EARLIER (the measure was cosine similarity of Internet search results from one day to the next).  While debugging the problem, I came up with an inelegant but effective workaround of using calculated columns to store intermediate results.  It was fast enough for tens of millions of rows.  (PowerPivot in Excel 64 is great!)

     

    • Marked as answer by Challen Fu Monday, September 12, 2011 8:27 AM
    • Unmarked as answer by Challen Fu Monday, September 12, 2011 8:28 AM
    Tuesday, September 6, 2011 6:21 PM
  • John,

    using calculated columns to store intermediate results is definitely a best practice especially when you are creating and debugging a calculation. Well done!

    Marco


    Marco Russo http://www.powerpivotworkshop.com http://sqlblog.com/blogs/marco_russo
    Tuesday, September 6, 2011 9:35 PM