locked
Aggregation functions and Semi additivity RRS feed

  • Question

  • Hi

    I have some troubles these days with the concept of semi additivity related to the aggregation fuctions. i found in many web sites that it means that the mesure can be aggregated with all dimensions expect the time ones (http://www.packtpub.com/article/measures-and-measure-groups-microsoft-analysis-services-part1) when the official docs don't mention this.

    Is it true that a semi additive function can't only be associated with a time dimension?

    if it can be some thing else than a time dimension, is there a way to specify the dimensions that can not be used for the aggregation?

    and what will be the result if we aggregat with a no supported dimension? NULL or a result that is logicaly wrong ?

     

    Thanks

    Wednesday, July 21, 2010 9:11 AM

Answers

  • Ah, because in my example there could be any number of other dimensions. Let's say there was also a Product dimension whose default member is [All]. So, the values in my example are the values summed across all products but for the given time period. If I was to place the Product dimension on columns say, I would get the Sum value, LastNonEmpty and LastChild values for each product.

    So in essence, semi-additive measures are summed up all dimension except time.

    Yes?

     

    • Marked as answer by BadBoy06000 Friday, July 23, 2010 1:44 PM
    Thursday, July 22, 2010 5:39 PM
  • No, MIN and MAX are semi-additive. MIN and MAX will show the minimum (or maximum) value for a given period across all other dimensions. It cannot SUM and then take the MIN or MAX 'cos that doesn't make sense.

    So, using the following base for AS:

     

    -- DROP TABLE fact_simple_test;
    
    
    
    CREATE
    
     TABLE
    
     fact_simple_test
    (
     product_code varchar
    
    (10),
     product_category_code varchar
    
    (10),
     sale_date date
    
    ,
     amount  numeric
    
    (24,12)
    );
    
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P1'
    
    , 'Cat1'
    
    , '20100601'
    
    , 10);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P1'
    
    , 'Cat1'
    
    , '20100602'
    
    , 30);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P3'
    
    , 'Cat2'
    
    , '20100601'
    
    , 25);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P2'
    
    , 'Cat1'
    
    , '20100601'
    
    , 13);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P3'
    
    , 'Cat2'
    
    , '20100601'
    
    , 5);
    
    -- DROP TABLE sale_calendar
    
    
    
    CREATE
    
     TABLE
    
     sale_calendar
    (
     sale_date date
    
    ,
     sale_month varchar
    
    (20),
     sale_year varchar
    
    (20),
    );
    
    INSERT
    
     INTO
    
     sale_calendar (sale_date, sale_month, sale_year)
    VALUES
    
    ('20100601'
    
    , 'June 2010'
    
    , '2010'
    
    );
    INSERT
    
     INTO
    
     sale_calendar (sale_date, sale_month, sale_year)
    VALUES
    
    ('20100602'
    
    , 'June 2010'
    
    , '2010'
    
    );
    INSERT
    
     INTO
    
     sale_calendar (sale_date, sale_month, sale_year)
    VALUES
    
    ('20100603'
    
    , 'June 2010'
    
    , '2010'
    
    );
    
    

     

     

    And using MIN as the aggregation function and configuring Sales Calendar as time, we can see that the results for June across all products will show 5 but for June for category 1 will show 10.

                All    All    All    All    Cat1    Cat1    Cat1    Cat2    Cat2
                All    P1    P2    P3    All    P1    P2    All    P3
    All    All    All    5    10    13    5    10    10    13    5    5
    All    All    2010-06-01    5    10    13    5    10    10    13    5    5
    All    All    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)
    All    June 2010    All    5     10    13    5    10     10    13    5    5
    All    June 2010    2010-06-01    5    10    13    5    10    10    13    5    5
    All    June 2010    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)
    2010    All    All    5    10    13    5    10    10    13    5    5
    2010    All    2010-06-01    5    10    13    5    10    10    13    5    5
    2010    All    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)
    2010    June 2010    All    5    10    13    5    10    10    13    5    5
    2010    June 2010    2010-06-01    5    10    13    5    10    10    13    5    5
    2010    June 2010    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)

    Apologies for the formatting! Haven't got time to fix it...

    Please mark answers as correct if you feel I have answered your question.

    • Marked as answer by BadBoy06000 Friday, July 23, 2010 1:44 PM
    Friday, July 23, 2010 1:28 PM

All replies

  • Yes, all semi-additive aggregation functions relate to time.

    From BOL (2008) :

    Semiadditive

    A semiadditive measure can be aggregated along some, but not all, dimensions that are included in the measure group that contains the measure. For example, a measure that represents the quantity available for inventory can be aggregated along a geography dimension to produce a total quantity available for all warehouses, but the measure cannot be aggregated along a time dimension because the measure represents a periodic snapshot of quantities available. Aggregating such a measure along a time dimension would produce incorrect results.

    There is a slight extension to this related to the "ByAccount" aggregation function which alters the behaviour of the aggregation function based on an account. The way the behaviour is changed still relates to time, just that the "ByAccount" function derives its behaviour from a dimension defined as an Account dimension. For more information on this see the section in BOL (2008) entitled: Account (Analysis Services - Multidimensional Data)

    The following are the Semi-Additive Aggregation Functions :

    Min
     Retrieves the lowest value for all child members.
     
    Max
     Retrieves the highest value for all child members.
     
    ByAccount
     Calculates the aggregation according to the aggregation function assigned to the account type for a member in an account dimension. If no account type dimension exists in the measure group, treated as the None aggregation function.
     For more information about account dimensions, see Account (Analysis Services - Multidimensional Data).
     
    AverageOfChildren
     Calculates the average of values for all non-empty child members.
     
    FirstChild
     Retrieves the value of the first child member.
     
    LastChild
     Retrieves the value of the last child member.
     
    FirstNonEmpty
     Retrieves the value of the first non-empty child member.
     
    LastNonEmpty
     Retrieves the value of the last non-empty child member.

     

    Essentially, measures defined with a semi-additive aggregation function are aggregated based on a dimension set as being of type "time", be it a dimension of "ship dates" or "ordered dates" or "report dates" etc. Clearly you can created calculations using base measures to do anything you want. For example, you could create a calculated member to aggregate over time but use the LastChild function on some other dimension to get the value for all time and some last member in another dimension.

    Wednesday, July 21, 2010 1:31 PM
  • thanks for your answer

    actualy, i ask this question because when i read BOL, it seems like the time dimension was juste an exemple and not the UNIQUE case.

    Can u explain me what is the specificity of the semi additvity behavior? when we take for exemple the quantity available mesure, i can get the minimum or maximum for all periods, by years or months, where is the difference with additive ones?

    Thursday, July 22, 2010 10:12 AM
  • Maybe the little example below demonstrate the behaivior of semi-additive aggregates; here I used distinct count.

    Depending on the level of aggregation; here the first on product and the second without a level, you get different results in the sum of distinct count. The sum of the amount instead stays always the same.

    Facttable
    Product Customer Amount
    Bike Peter 12
    Bike Steve 14
    Car Peter 223
    Car Martin 112
    Group by Product Distinct # Customer Sum Amount
    Bike 2 26
    Car 2 335
    Total 4 361
    Group by none Distinct # Customer Sum Amount
    3 361


    Olaf Helper ----------- * cogito ergo sum * errare humanum est * quote erat demonstrandum * Wenn ich denke, ist das ein Fehler und das beweise ich täglich http://olafhelper.over-blog.de
    Thursday, July 22, 2010 10:48 AM
  • thanks olaf, your exemple is realy helpful to see the behavior of an additive and a non additive

    But what i can't understand, is the semi additive, like i said before, is the time dimension the UNIQUE dimension that i can't use with semi additive functions and, if it's the case, why for exemple when we take the quantity available mesure(the exemple of philip), i can get the minimum or maximum for all periods, by years or months, isn't this an agregation over time?

    Thursday, July 22, 2010 11:40 AM
  • Okay, the differences are as follows...

    Let's assume a time dimension with 3 days for Jan and 1 day for Feb only and 3 measures whose aggregation functions are set such that the first is additive (sum), the second is semi-additive (LastNonEmpty) and the third is also semi-additive (LastChild).  So...

                                  Sum      LastNonEmpty    LastChild
    Year 2010                40              10                10
        - Jan                   30              10                Null
             - 01 Jan          20              20                20
             - 02 Jan          10              10                10
             - 03 Jan          -                 -                   -
        - Feb                    10              10               10
             - 01 Feb          10              10               10

    We can see that for Jan, the Sum is 30 (20 + 10 + Null), LastNonEmpty is 10 (Jan 02 is the last non-empty) and LastChild is Null (Jan 03 is the last child of Jan and its value is Null).

    At the year level, the Sum is 40 (Jan + Feb = 30 + 10), LastNonEmpty is 10 (Feb is the last non-empty child of 2010) and LastChild is 10 (Feb is the last child of 2010).
    Thursday, July 22, 2010 1:16 PM
  • It's exactly the point that i want to reach

    when we take the two additive functions (sum, count) the agregation result of a level is the sum of the level below (and it works for sum and count). for the semi additive ones the agregation must be re-done from the details and, i don't now now if you agree, it is the same thing when we use a time dimension or any other type of dimension.

    So why we relate always semi additivity to time dimension when we got exactly the same behavior for the time dimension and for any type of dimensions ?

    Thursday, July 22, 2010 1:53 PM
  • Ah, because in my example there could be any number of other dimensions. Let's say there was also a Product dimension whose default member is [All]. So, the values in my example are the values summed across all products but for the given time period. If I was to place the Product dimension on columns say, I would get the Sum value, LastNonEmpty and LastChild values for each product.

    So in essence, semi-additive measures are summed up all dimension except time.

    Yes?

     

    • Marked as answer by BadBoy06000 Friday, July 23, 2010 1:44 PM
    Thursday, July 22, 2010 5:39 PM
  • YES !! THANK U VERY MUCH

    i tested what u said and it s 100% true for average, last, first, first no empty, last no empty: the selected function is used only for the time dimension and for all others the sum function is used

    Just a last thing, in my test the min and max are used even for a time dimension (behave as an additive function) just like said in many docs but unlike what said in BOL where they classify min and max with semi additive functions

    Is it just an error in BOL, Can you test it please?

    Thursday, July 22, 2010 8:29 PM
  • No, MIN and MAX are semi-additive. MIN and MAX will show the minimum (or maximum) value for a given period across all other dimensions. It cannot SUM and then take the MIN or MAX 'cos that doesn't make sense.

    So, using the following base for AS:

     

    -- DROP TABLE fact_simple_test;
    
    
    
    CREATE
    
     TABLE
    
     fact_simple_test
    (
     product_code varchar
    
    (10),
     product_category_code varchar
    
    (10),
     sale_date date
    
    ,
     amount  numeric
    
    (24,12)
    );
    
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P1'
    
    , 'Cat1'
    
    , '20100601'
    
    , 10);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P1'
    
    , 'Cat1'
    
    , '20100602'
    
    , 30);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P3'
    
    , 'Cat2'
    
    , '20100601'
    
    , 25);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P2'
    
    , 'Cat1'
    
    , '20100601'
    
    , 13);
    INSERT
    
     INTO
    
     fact_simple_test (product_code, product_category_code, sale_date, amount)
    VALUES
    
    ('P3'
    
    , 'Cat2'
    
    , '20100601'
    
    , 5);
    
    -- DROP TABLE sale_calendar
    
    
    
    CREATE
    
     TABLE
    
     sale_calendar
    (
     sale_date date
    
    ,
     sale_month varchar
    
    (20),
     sale_year varchar
    
    (20),
    );
    
    INSERT
    
     INTO
    
     sale_calendar (sale_date, sale_month, sale_year)
    VALUES
    
    ('20100601'
    
    , 'June 2010'
    
    , '2010'
    
    );
    INSERT
    
     INTO
    
     sale_calendar (sale_date, sale_month, sale_year)
    VALUES
    
    ('20100602'
    
    , 'June 2010'
    
    , '2010'
    
    );
    INSERT
    
     INTO
    
     sale_calendar (sale_date, sale_month, sale_year)
    VALUES
    
    ('20100603'
    
    , 'June 2010'
    
    , '2010'
    
    );
    
    

     

     

    And using MIN as the aggregation function and configuring Sales Calendar as time, we can see that the results for June across all products will show 5 but for June for category 1 will show 10.

                All    All    All    All    Cat1    Cat1    Cat1    Cat2    Cat2
                All    P1    P2    P3    All    P1    P2    All    P3
    All    All    All    5    10    13    5    10    10    13    5    5
    All    All    2010-06-01    5    10    13    5    10    10    13    5    5
    All    All    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)
    All    June 2010    All    5     10    13    5    10     10    13    5    5
    All    June 2010    2010-06-01    5    10    13    5    10    10    13    5    5
    All    June 2010    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)
    2010    All    All    5    10    13    5    10    10    13    5    5
    2010    All    2010-06-01    5    10    13    5    10    10    13    5    5
    2010    All    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)
    2010    June 2010    All    5    10    13    5    10    10    13    5    5
    2010    June 2010    2010-06-01    5    10    13    5    10    10    13    5    5
    2010    June 2010    2010-06-02    30    30    (null)    (null)    30    30    (null)    (null)    (null)

    Apologies for the formatting! Haven't got time to fix it...

    Please mark answers as correct if you feel I have answered your question.

    • Marked as answer by BadBoy06000 Friday, July 23, 2010 1:44 PM
    Friday, July 23, 2010 1:28 PM
  •  

    Thank you very much for your explanations, they were realy clear

    A last(I hope) thing, if i summarise:

    * all semi additive use the selected function for the aggregation with time dimensions and sum with all others.

    * the Min and Max are particular in the sense that they are used both for the time dimensions and others dimension.

    It's that?

    Friday, July 23, 2010 1:53 PM
  • Yup, you've got it!

    The Min and Max is possibly made more clear by looking at the second to last line in the "unformatted" output in my previous post. You can see there that for a single day (i.e. 20100601) the Min value for all products was 5, thus it is the minumum value across all products on that day.

    • Proposed as answer by MASNSN Tuesday, September 14, 2010 11:27 PM
    Friday, July 23, 2010 3:24 PM
  •  

    At last, it took me a while to get a correct and a clear response

    thank u for the time that you spent to explain me

    Friday, July 23, 2010 4:26 PM