locked
Help to speed up CALCULATE formula RRS feed

  • Question

  • I am building a reporting system of production profiles for a set of assets. The key data structure is as follows:

    Version  Field               Action    Profile          Date          Production     RiskCategory

    v1          Field1         Action00    00_Low      31/1/2014        5000         1

    v1           Field1         Action00    00_Low     28/2/2014         4980        1

    v1           Field1         Action00    00_Low     31/3/2014          4950       1

    v1           Field1        Action00      00_Base     31/1/2014        5300       2

    ..

    v1           Field1        Action00      00_High     31/1/2014         5600       3

    ..

    v1           Field1        Action01      01_Low     31/1/2014        8000         1

    ...

     v2           Field1         Action00    00_Low    31/1/2014          2000         1

     ...

    Ie. the data consists of production profiles. Each [Version] has many [Field]s who in turn have many [Action]s who in turn have a High, Base, Low [Profile]s who in turn have a [Production] for each [Date].

    Data size:

    Each profile has a monthly entry from 2014 to 2034. There are typically 10 Actions per field, and maybe 4 fields per version. So number of rows per version is: 12months*30years*3profilesperaction*10actions*4fields = 43200. Number of versions per data set is not decided yet, however as time goes by there will be a steady increase in nr of versions... it all depends on the speed and practicality.

    Related tables:A related table (on Version and Profile) adds [Risk Category] to the [Profile] (and other attributes not relevant here. There is also another related table on [Version] [Field] [Date] not relevant here.

    Problem Description: I use PowerPivot to tie 3 tables togetether via composite keys (calculated column = [Version]&[Profile] etc). So I dont have to repeat [RiskCategory] for each date in the profile table and makes QCing inputs far easier. The purpose of the database is to take the production profiles and slice them in different ways to satisfy different input systems.

    The most challenging is that I need to calculate the incremental contribution from the Base and High  [Profiles] in each action.

    Generate a calculated column that for each Date in each Action calculates:

    if [RiskCategory] (thisrow) = 1, then [Production])

    If [RiskCategory] (thisrow) = 2, then [Production] (where [RiskCategory]=2)  - 

                                               [Production] (where[RiskCategory]=1)

    If [RiskCategory] (thisrow) = 3, then [Production] (where [RiskCategory]=3)  - 

                                               [Production] (where[RiskCategory]=2)

    My solution below is a Calculated Column that uses Calculate Allexcept and Earlier to remember the row context. However, the calculation is a little slow: 5 sec maybe - which will get worse as I add more versions and possibly more fields with more actions. and I would like to see if I can speed it up:

    =if( Profile[RiskCategory]>1, 
    
    Calculate(SUM(Profile[Production]), Allexcept(Profile,Profile[Date],Profile[Version],Profile[Action]), Profile[RiskCategory] = Earlier(Profile[RiskCategory]))-
    Calculate(SUM(Profile[Production]), Allexcept(Profile,Profile[Date],Profile[Version],Profile[Action]), Profile[RiskCategory] = Earlier(Profile[RiskCategory])-1)
    
    ,Profile[Production])

    Any suggestions?

    Hope this is clear.

    Thanks

    Lars



    • Edited by larsito22 Friday, November 21, 2014 6:59 PM
    Thursday, November 20, 2014 10:06 AM

Answers

  • Calculated columns are only calculated at workbook refresh time. If your pivot tables are slow, then it is a measure slowing you down, not any of your calculated columns.

    What do your pivot tables look like and what measures are you using?

    Friday, November 21, 2014 2:31 PM
  • Lars,

    I will go out on a limb here, and say look at the following

    • You may be using Calculated Column, when you should be creating Measures.
      See #1 in 5 common mistakes made by self taught DAX students
    • Another may be flattening your tables, instead of keeping Data and Lookup tables separate (Star Schema). Not sure if you're doing this. If not, great! If yes, then see Flat to Star Transformation using DAX Query
    • Instead of joining your tables using Calculated Columns (Composite Keys: Version&Profile) - you are better off bringing in the Composite key in the query itself. That can have a large bearing on performance (depending on the shape of your model). Short Version: That lets Power Pivot optimize the compression of that column far better than that of a calculated column. So, let's say you're getting your data from SQL, then just use SQL query and add that composite column. Else, you can also look to using Power Query to add that column and send it to Power Pivot.
    • Mmm...we are all confused where the performance issues are. Calculated Column are computer at refresh, and after that they are lightening fast when used in Pivots.
      So if you're DAX expression in a calculated column is a beats, you would feel that when you refresh your model - NOT when you are using it in a pivot table.

      So if your performance slowdown is when using the Pivot Table, the issue is somewhere else.
      This has a few pointers Power Pivot Performance Gotchas

    If you're still stuck, perhaps provide us a little more glimpse in to your model.
    This has some pointers (for one, I like seeing Model Diagrams, they help me quickly orient myself)
    How to ask a Power Pivot Question to get a prompt, accurate and helpful response


    Regards, Avi www.powerpivotpro.com PowerPivotPro

    Wiki:How to ask a Power Pivot Question to get a prompt, accurate and helpful response

    Friday, November 21, 2014 3:35 PM

All replies

  • 5 seconds of process time is not that big of a deal. If your concern is refreshing the data every day, I would recommend setting it up on SharePoint and enabling automatic refresh, so the processing is completed overnight and you need not be concerned with it.
    Thursday, November 20, 2014 4:43 PM
  • Hi Greg,

    Thanks, problem was that when working with the Pivot tables its slow and sluggish. I wondered if what I had done (ie the Calculate) statement was not "best practice" and if there are ways of improving the speed. I have only been fiddling with PowerPivot for a week or so.

    Thanks

    Lars

    Friday, November 21, 2014 11:12 AM
  • Calculated columns are only calculated at workbook refresh time. If your pivot tables are slow, then it is a measure slowing you down, not any of your calculated columns.

    What do your pivot tables look like and what measures are you using?

    Friday, November 21, 2014 2:31 PM
  • Lars,

    I will go out on a limb here, and say look at the following

    • You may be using Calculated Column, when you should be creating Measures.
      See #1 in 5 common mistakes made by self taught DAX students
    • Another may be flattening your tables, instead of keeping Data and Lookup tables separate (Star Schema). Not sure if you're doing this. If not, great! If yes, then see Flat to Star Transformation using DAX Query
    • Instead of joining your tables using Calculated Columns (Composite Keys: Version&Profile) - you are better off bringing in the Composite key in the query itself. That can have a large bearing on performance (depending on the shape of your model). Short Version: That lets Power Pivot optimize the compression of that column far better than that of a calculated column. So, let's say you're getting your data from SQL, then just use SQL query and add that composite column. Else, you can also look to using Power Query to add that column and send it to Power Pivot.
    • Mmm...we are all confused where the performance issues are. Calculated Column are computer at refresh, and after that they are lightening fast when used in Pivots.
      So if you're DAX expression in a calculated column is a beats, you would feel that when you refresh your model - NOT when you are using it in a pivot table.

      So if your performance slowdown is when using the Pivot Table, the issue is somewhere else.
      This has a few pointers Power Pivot Performance Gotchas

    If you're still stuck, perhaps provide us a little more glimpse in to your model.
    This has some pointers (for one, I like seeing Model Diagrams, they help me quickly orient myself)
    How to ask a Power Pivot Question to get a prompt, accurate and helpful response


    Regards, Avi www.powerpivotpro.com PowerPivotPro

    Wiki:How to ask a Power Pivot Question to get a prompt, accurate and helpful response

    Friday, November 21, 2014 3:35 PM
  • Thanks Avi and Greg,

    I am now guessing the sluggishness is coming from the size of the pivot table that I am using:

    Columns: Year, Month (ie 12X30 columns), Rows: Version, Field, Action (1x4x10), Values: 4 different measures based on production and delta prodcution. In all it makes for almost 60000 cells.

    Data source queries and relationships

    The data sits in tables in excel and is organized in a "star" schema to allow the users to input and maintain the data easier. ie without having to copy all the duplicate columns in a flat file. 

    I use composite keys and avoid autokeys etc to avoid data entry being tedious and unintelligible. Remember these tables are used by the users to input and store production profiles.

    Dax Queries I know nothing about, would they work for my set up? ie without an external sql database etc to query.

    Use measure in stead of calculated column

    I would like to see if I could make this a measure instead of a column. However, I am struggling to see how as I need multiple references to the current row context (hence why the ability of using Earlier() in the Calculated column was handy. Maybe a recursive SUMX would do it?

    To summarize: The calculation requires the computation between the current [production] and the [production] in the other [RiskCategoriy]s in the same context ie: same [DATE], [ACTION], [FIELD], [VERSION] 

    ie: if RiskCateory is 1, just return current [production], if it is 2 or 3 then return Production (2) - Production (1) or Production(3)-Production(2) respectively (belonging to same version/field/action 

    Example under of the calculated column*

    Version  Field    Action    Profile          Date       Production RiskCategory   Calculation*

    v1         Field1  Action00    00_Low   31/1/2014      5000         1                   5000

    ..

    v1         Field1   Action00    00_Base  31/1/2014      5400        2                    400

    ..

    v1        Field1    Action00    00_High   31/1/2014      6800       3                    1400

    Hope this clarifies a little and I will endeavour to paste my workbook here so you can take a look for yourself.  

    Thanks again!

    Lars






    • Edited by larsito22 Friday, November 21, 2014 7:11 PM
    Friday, November 21, 2014 6:59 PM
  • Larsito, it's hard to tell what your question is here now. Can you restate it?

    Thanks!


    Ed Price, Azure & Power BI Customer Program Manager (Blog, Small Basic, Wiki Ninjas, Wiki)

    Answer an interesting question? Create a wiki article about it!

    Saturday, December 13, 2014 12:01 AM