locked
Math functions and relationships RRS feed

  • Question

  • Hi all,

    I'm having trouble understading the behavior of math functions and their interaction with filters applied on dimension tables.

    I have built a small example to illustrate what IMHO seems like a bug, but there's probably some occult reason behind it.

    I have three tables on Excel and linked them to my model: FACT, DIMENSION1 and DIMENSION2. FACT has keys poiting to the dimension tables, and one value column. DIMENSION1 and DIMENSION2 have one string column each. Here is the data:

    I created the relationships linking the FACT table to the dimension tables, and I've created two measures:

    Sum of Value:=SUM(FACT[Value])

    Exp of Sum of Value:=EXP([Sum of Value])

    I then created a pivot table where I put DIMENSION1[Dim1] and DIMENSION2[Dim2] on the rows section of the pivot table. Here is the resulting table:

    Can someone explain to me why there is any data shown on the highlighted cells? I suppose powerpivot is assuming that EXP(BLANK()) = EXP(0) = 1. Is this the expected behavior?

    The strange thing about it is that if I use FACT[Dim1] and FACT[Dim2] in the rows section, results are as expected.

    I could work around this by ading conditional logic to set the measure to BLANK(), but in my real world scenario there are thousands of entries in Dim2, and I deppend heavily on filters defined on other dimensions to narrow down the amount of data in each table. With the exponential set on the dimension tables, performance becomes a real drag because apparently the engine needs to iterate through millions of rows for each Dim2 entry before it can decide that it should be blank.

    Is it just me or is this a bug?

    Regards,

    Carlos Jourdan

    Tuesday, September 16, 2014 9:06 PM

Answers

  • I've run into similar issues with this, and my working theory that I've not had time to explore thoroughly is that the problem lies not with DAX or the Tabular engine, but with the sub-optimal MDX generated by Excel.

    If you run a profiler trace, you will likely see a CROSS JOIN on your dimensions as one of the first steps in the query. This provides the cartesian product (all combinations) of the dimensions, even for those with no valid relationship through the fact table. The measures are then calculated for each member of this (potentially huge) table, and those with NULL/BLANK() values are dropped from the result returned to Excel.

    Implicit conversion is usually a pretty valuable thing, but in this case, the NULL/BLANK() that would keep your A-K and C-J values from appearing when you just perform the SUM() is used to evaluate the EXP(). The NULL/BLANK() is passed as an argument to EXP(), where it is implicitly converted to a 0 and evaluated to 1.

    I suspect that if you run a DAX query against your Tabular model directly you will see the expected behavior:

    EVALUATE
    ADDCOLUMNS(
        SUMMARIZE( FactTable
            , Dim1[Dim1]
            , Dim2[Dim2]
        )
        , "Sum", <your sum measure>
        , "Exp", <your exp measure>
    )
    

    In this example, you must ensure that your measures are wrapped in CALCULATE().

    If that example works, you might consider doing your reporting in SSRS where you can write DAX queries against the database (there are a few workarounds necessary for this, but it's worth it; DAX queries are blazingly fast compared to Excel pivot tables in many cases - all due to the poor MDX that Excel generates).

    Tuesday, September 23, 2014 9:01 PM
  • Hi Carlos,

    Although this may not immediately benefit you, may I suggest that you create a Microsoft Connect account and then submit this issue as a bug/suggestion.

    You can register here: https://connect.microsoft.com/.

    Once you have registered, you can report this issue under the Analysis Services category here: https://connect.microsoft.com/SQLServer/feedback/CreateFeedback.aspx.

    As Greg has explained, the performance issue is due to the sub-optimal MDX queries that Excel generates, but at the root of all of this is how the EXP function is handling blank values. I've spent some time looking into ways to suppress the blanks without using the IF function, but am yet to arrive at a solution where the Pivot Table behaves as you require.

    As it currently stands, your workaround seems to be the only way around the behaviour and performance issues in your scenario.


    Regards,

    Michael Amadi

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful :)

    Website: http://www.nimblelearn.com
    Blog: http://www.nimblelearn.com/blog
    Twitter: @nimblelearn

    Wednesday, September 24, 2014 1:03 PM
  • Hi CJourdan,

    I think that's the logic of PowerPivot. It also heppened on other calculation. e.g.

    =COUNTROWS(FILTER(FACT,FACT[Dim1]="A"&&FACT[Dim2]="K"))=0

    =ISBLANK(COUNTROWS(FILTER(FACT,FACT[Dim1]="A"&&FACT[Dim2]="K")))

    They are all True. For the calculation whose result is number, the 0 equals to the blank.

    So, please try this:

    Exp of Sum of Value:=IF([Sum of Value]>0, EXP([Sum of Value]), BLANK())

    As Kim has mentioned this behaviour is by design and even using the exponentiation operation will give the same behaviour because the blank value is implicitly converted to a number first:

    Exp of Sum of Value Alternative := 2.71828182845904523536028747135266249775724709369995 ^ [Sum of Value]

    Regards,

    Michael Amadi

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful :)

    Website: http://www.nimblelearn.com
    Blog: http://www.nimblelearn.com/blog
    Twitter: @nimblelearn

    Wednesday, September 17, 2014 7:07 AM

All replies

  • Hi CJourdan,

    I think that's the logic of PowerPivot. It also heppened on other calculation. e.g.

    =COUNTROWS(FILTER(FACT,FACT[Dim1]="A"&&FACT[Dim2]="K"))=0

    =ISBLANK(COUNTROWS(FILTER(FACT,FACT[Dim1]="A"&&FACT[Dim2]="K")))

    They are all True. For the calculation whose result is number, the 0 equals to the blank.

    So, please try this:

    Exp of Sum of Value:=IF([Sum of Value]>0, EXP([Sum of Value]), BLANK())

    • Proposed as answer by Michael Amadi Wednesday, September 17, 2014 5:34 AM
    • Unproposed as answer by CJourdan Wednesday, September 17, 2014 11:44 AM
    Wednesday, September 17, 2014 1:45 AM
  • Hi CJourdan,

    I think that's the logic of PowerPivot. It also heppened on other calculation. e.g.

    =COUNTROWS(FILTER(FACT,FACT[Dim1]="A"&&FACT[Dim2]="K"))=0

    =ISBLANK(COUNTROWS(FILTER(FACT,FACT[Dim1]="A"&&FACT[Dim2]="K")))

    They are all True. For the calculation whose result is number, the 0 equals to the blank.

    So, please try this:

    Exp of Sum of Value:=IF([Sum of Value]>0, EXP([Sum of Value]), BLANK())

    As Kim has mentioned this behaviour is by design and even using the exponentiation operation will give the same behaviour because the blank value is implicitly converted to a number first:

    Exp of Sum of Value Alternative := 2.71828182845904523536028747135266249775724709369995 ^ [Sum of Value]

    Regards,

    Michael Amadi

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful :)

    Website: http://www.nimblelearn.com
    Blog: http://www.nimblelearn.com/blog
    Twitter: @nimblelearn

    Wednesday, September 17, 2014 7:07 AM
  • Hi Michael and Kim,

    Thank you for your answers.

    However, I think I have failed to put emphasis at the real problem at hand. As I mentioned on my original post, it is simple to get the desired results by adding some coditional logic, such as:

    Exp of Sum of Value:=IF(ISBLANK([Sum of Value]), BLANK(), EXP([Sum of Value]))

    The problem is that this measure, when used on my real dataset, where FACT has 6 million rows and DIM2 has 40k rows, is intractable, regardless of the filtering that I might be doing on the outset to slice the data.

    This is, in my opion, very suboptimal behavior, since calculating the [Sum of Value] measure on my tables is pretty fast, and just applying the exponential should be trivial.

    Moreover, I don't understand why the behavior is different when I filter through the dimension tables (which I always thought was best practice) than when I use the "foreign keys" FACT[Dim1] and FACT[Dim2] to do the filtering. This has led me to a very poor workarround: I've put all the columns from the dimension tables in my fact table through the use of the RELATED function, and now the calculation speed is as expected.

    So, the center question is: why does applying the EXP() function change the way the engine iterates through my data, adding unecessary complexity, and is there a better way to circumvent this issue?

    Wednesday, September 17, 2014 11:44 AM
  • I've run into similar issues with this, and my working theory that I've not had time to explore thoroughly is that the problem lies not with DAX or the Tabular engine, but with the sub-optimal MDX generated by Excel.

    If you run a profiler trace, you will likely see a CROSS JOIN on your dimensions as one of the first steps in the query. This provides the cartesian product (all combinations) of the dimensions, even for those with no valid relationship through the fact table. The measures are then calculated for each member of this (potentially huge) table, and those with NULL/BLANK() values are dropped from the result returned to Excel.

    Implicit conversion is usually a pretty valuable thing, but in this case, the NULL/BLANK() that would keep your A-K and C-J values from appearing when you just perform the SUM() is used to evaluate the EXP(). The NULL/BLANK() is passed as an argument to EXP(), where it is implicitly converted to a 0 and evaluated to 1.

    I suspect that if you run a DAX query against your Tabular model directly you will see the expected behavior:

    EVALUATE
    ADDCOLUMNS(
        SUMMARIZE( FactTable
            , Dim1[Dim1]
            , Dim2[Dim2]
        )
        , "Sum", <your sum measure>
        , "Exp", <your exp measure>
    )
    

    In this example, you must ensure that your measures are wrapped in CALCULATE().

    If that example works, you might consider doing your reporting in SSRS where you can write DAX queries against the database (there are a few workarounds necessary for this, but it's worth it; DAX queries are blazingly fast compared to Excel pivot tables in many cases - all due to the poor MDX that Excel generates).

    Tuesday, September 23, 2014 9:01 PM
  • Hi Carlos,

    Although this may not immediately benefit you, may I suggest that you create a Microsoft Connect account and then submit this issue as a bug/suggestion.

    You can register here: https://connect.microsoft.com/.

    Once you have registered, you can report this issue under the Analysis Services category here: https://connect.microsoft.com/SQLServer/feedback/CreateFeedback.aspx.

    As Greg has explained, the performance issue is due to the sub-optimal MDX queries that Excel generates, but at the root of all of this is how the EXP function is handling blank values. I've spent some time looking into ways to suppress the blanks without using the IF function, but am yet to arrive at a solution where the Pivot Table behaves as you require.

    As it currently stands, your workaround seems to be the only way around the behaviour and performance issues in your scenario.


    Regards,

    Michael Amadi

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful :)

    Website: http://www.nimblelearn.com
    Blog: http://www.nimblelearn.com/blog
    Twitter: @nimblelearn

    Wednesday, September 24, 2014 1:03 PM