none
How to avoid Leaf-Level calculations RRS feed

  • Question

  • From Spofford's book: "...the developer can create measures instead of using an MDX script to perform leaf-level calculations. Then the leaf-level calculation is precalculated during cube processing time via an SQL Query, and the expensive on-the-fly calculation is avoided. For the Amount = Price * Volume type of calculation, a simple calculated column in DSV ... "

    Now how do you achieve similar performance gain when you have a division instead of a multiplication ? For instance average price, calculated as TotalPrice / Quantity.

    According to this book, using a MDX Script (under the Calculations tab of Cube Design in BIDS) will imply a heavy performance penalty.

    Are there any alternatives ?

     

    Wednesday, November 1, 2006 4:16 PM

Answers

  • The operand has nothing to do with it. My example could just as easily have included a multiplication instead of a division. The difference is where the calculation takes place in the cube space. When you scope on the leaves of a cube/measure group for a regular measure, the calculation is carried out at precisely the leaf-level and then aggregated.

    Example:

    SCOPE(LEAVES(MyMeasureGroup), [Measures].[MyRegularMeasure]);

    This = [Measures].[MyRegularMeasure]/2;

    END SCOPE;

    In this case the measure MyRegularMeasure would be divided by 2 for all the leaf-level cells in the measure group and then aggregated - independent of which query is executed. THAT's what's expensive! When you create a calculated member the calculation only takes place for the cells relevant for the submitted query. In my previous example it would take place for 3 cells.

    Thursday, November 2, 2006 6:48 AM

All replies

  • Well, if you want the result of your division summed up (assuming you've created a new measure which aggregates using Sum) then you can do exactly the same thing. All the technique you mention is doing is pushing the calculation of Price * Volume or TotalPrice/Quantity back to processing time, as opposed to calculating it and then aggregating it at runtime.

    However, if for some reason you can't perform the calculation in the dsv, in a view or during your ETL, then using an MDX Script to perform the calculation might not necessarily impose a massive performance penalty. There will be some impact on query performance but in my experience doing this type of calculation in the script performs fairly well, although of course performance also depends on your cube design, the query you're running, the hardware that AS is running on and a hundred other factors. You might also want to check out measure expressions, which also allow you to do simple multiplications and divisions before measures are aggregated in some circumstances.

    HTH,

    Chris

    Wednesday, November 1, 2006 5:12 PM
    Moderator
  • Obviously, I don't want to have the result of the division summed up. Assuming the AveragePrice is 10 for each leaf, and we have 20 leaves. Aggregating the result of the division to the parent would result in a wrong average price of 200 instead of the correct average of 10.

    That's the reason why I can't perform the calculation in the DSV.

    The book claims that leaf-level aggregations have a heavy negative performance impact:

    Quote from this book: "If there is any calculation on any region of your cube, the sparsity heuristics from the fact table are no longer valid for the server to optimize the the aggregations. Calculations can set the value to any cell anywhere in the cube space. A simple assignment like "this=1;" will set the whole cube space to 1. etc..."

    Also Mosha goes to great lengths on his blog to explain how to avoid leaf-level aggregations, it's got to be worth it.

    Unfortunately this seems to be workable for multiplications only, not for divisions like in my example. So I was looking for directions on how to optimize aggregations on divisions. There are plenty of cases where this is needed: Average price, Margin percent, sales per square meter, sales per hour, etc, etc ...

     

    Wednesday, November 1, 2006 5:41 PM
  • Well... To me it does not sound like you need to perform leaf-level calculations. The average price could be obtained anywhere in the cube simply by dividing the two measures. Creating a calculated member like this should be adequate:

    CREATE MEMBER CURRENTCUBE.[Measures].[Average price] AS

    [Measures].[TotalPrice] / [Measures].[Quantity];

    This can be optimized and error handling could be added to make it:

    CREATE MEMBER CURRENTCUBE.[Measures].[Average price] AS

    IIF([Measures].[Quantity] <> 0, [Measures].[TotalPrice] / [Measures].[Quantity], NULL),

    NON_EMPTY_BEHAVIOR = [Measures].[Quantity];

    Wednesday, November 1, 2006 6:49 PM
  • I agree that the average price can be obtained anywhere in the cube. I fail to see how the suggestion is different from using an MDX script (the calculations tab in cube design) though.

    Why is it that George Spofford strongly recommends creating a calculated column in the DSV when the calculation is a multiplication ? His example is Amount, being calculated as Price * Volume.

    Amount can also be obtained anywhere in the cube, just like AveragePrice.

    I don't see why calculated members are supposed to be good enough for divisions (AveragePrice), and supposed to be bad for multiplications (Spofford's Amount)

    Am I missing something very basic here ?

    Wednesday, November 1, 2006 7:42 PM
  • Yes, I do think you are missing the point.

    The solution I suggest does not make computations at the leaf-level (well, it does if the query requests cells at the leaf-level), and the suggested calculated member will indeed be part of the MDX script. The discussion in "MDX Solutions" you are referring to IS around computations made at the leaf-level and then aggregated to higher levels in the cube. This can be an expensive process and should be avoided whenever possible - simply because the number of cells to compute can be very large. The number of cells to calculate at "higher" levels in the cube are usually much smaller. For instance, the following query will only "calculate" 3 cells, one for each of the three products on axis 1:

    SELECT {[Measures].[Average Price]} ON 0,

    {[Product].[Product].[Product One], [Product].[Product].[Product Two], [Product].[Product].[Product Three]} ON 1

    FROM MyCube

    Wednesday, November 1, 2006 8:04 PM
  • Why is it that the solution you suggest doesn't make computations at the leaf-level, and the same computation (the only difference being the operand) does make computations at the leaf level, according to G. Spofford ?

    I still fail to grasp why people think that a multiplication would aggregate from the leaf level (and thus needs to be pre-calculated in the DSV) and at the same time a division wouldn't aggregate from the leaf level and can be kept as a calculated measure.

     

    Wednesday, November 1, 2006 10:27 PM
  • The operand has nothing to do with it. My example could just as easily have included a multiplication instead of a division. The difference is where the calculation takes place in the cube space. When you scope on the leaves of a cube/measure group for a regular measure, the calculation is carried out at precisely the leaf-level and then aggregated.

    Example:

    SCOPE(LEAVES(MyMeasureGroup), [Measures].[MyRegularMeasure]);

    This = [Measures].[MyRegularMeasure]/2;

    END SCOPE;

    In this case the measure MyRegularMeasure would be divided by 2 for all the leaf-level cells in the measure group and then aggregated - independent of which query is executed. THAT's what's expensive! When you create a calculated member the calculation only takes place for the cells relevant for the submitted query. In my previous example it would take place for 3 cells.

    Thursday, November 2, 2006 6:48 AM
  • I see. You are relying on the fact that the operands are regular measures.

    When the operands are calculated measures, the calculation engine would have to aggregate them from the leaf level, right ? As Spofford says, an MDX script is evaluated each time a user query touches a calculation slice. In our example Amount or AveragePrice scope the entire pyramid from leaf to top.

    So for instance, a calculated measure GrowthInMarginPercent would aggregate from leaf level since MarginPercent is a calculated measure. Correct ?

    Otherwise, what is the point of replacing a calculated measure by a DSV computed column as Spofford advocates ?

     

    Thursday, November 2, 2006 7:34 AM