Unanswered Intolerably Slow Pivot Table

  • Wednesday, July 25, 2012 10:55 AM
     
      Has Code

    Hi

    I am using SQL Server 2008 R2 PowerPivot for Excel 2010 and am new to PowerPivot.

    I have a PowerPivot workbook with two facts tables

    AssignedWork: ProjectNo, EmpNo, Year, Month, FTEs, MonthKey - 64 rows
    ActualWork: ProjectNo, EmpNo, Year, Month, FTEs, MonthKey - 6884 rows

    and three dimension tables:

    Projects: ProjectNo, ProjectName - 815 rows
    Employees: EmpNo, Name - 766 rows
    Months: Year, Quarter, Month, MonthKey - 480 rows

    The facts tables each have relationsships with the three dimension tables via the respective keys. 

    I have created a pivot report with row labels (Projects[ProjectNo], Projects[ProjectName], Employees[Empno], Employees[Name]), column labels (Month[Year], Month[Quarter], Month[Month]), values (AssignedWork[FTEs], ActualWork[FTEs])

    The report works as expected, except for the time it takes to refresh: 3 to 5 minutes on my workbook computer. I assume this is mostly due to the cross join between Project and Employees that is done.

    Is there a way (using DAX?) to improve the performance? - Any help would be greatly appreciated!

    Georg

All Replies

  • Wednesday, July 25, 2012 11:58 PM
     
     
    Performance boost comes with 64-bit computer and software. BTW, I once remember a single Excel formula I made that took over 3 minutes to calculate on a 486 computer. Everything is relative to computing power and memory.
    • Edited by David Hager Wednesday, July 25, 2012 11:59 PM
    •  
  • Thursday, July 26, 2012 11:36 AM
     
     

    David,

    Thanks for your observation - it does not help me very much as the company will not go out and provide all of my users with brand new 64-bit machines...

    BTW: in the meantime I have noticed that the report (the pivot table) refreshes *much* faster when I drop the EmpNo from the row labels - 10 seconds against the 3+ minutes with the EmpNo.

    Now I wonder if someone can explain that difference?

    Georg

  • Tuesday, July 31, 2012 9:21 PM
     
     
    Can you share a sample workbook that shows the problem?

    -- This posting is provided "AS IS" with no warranties, and confers no rights

  • Wednesday, August 01, 2012 2:53 PM
     
     

    Hi Chu Xu

    I can - how do I go about sending you the workbook?

    Georg

  • Wednesday, August 01, 2012 3:22 PM
     
     

    Hi Georg

    Just by the numbers on your tables, the actual PowerPivot data model is small and should perform really fast.  From what you said it appears you dont have any custom DAX expressions on it.   So this is just a display/pivot table issue, apparently.

    Pivot tables where designed to work with aggregate data.  As such, the higher number of employees (or any other item) on rows, and the slower it will become.  However, 700 or 800 should still be relatively fast with one measure.   How many items are you placing on your columns and rows and what is the row count?




    Javier Guillen
    http://javierguillen.wordpress.com/

  • Thursday, August 02, 2012 11:03 AM
     
     

    Hi Javier

    With the Pivot Table Option 'Display rows containing no data' checked, the number of rows in the Pivot table is ca. 625'000 (this is equal to number of Projects X number of Employees). When the option is unchecked, the Pivot table contains roughly 6'000 rows.

    The label and value columns are as described in my original post.

    Again: why would adding the employee number (from the same PowerPivot table as the employee name) to the label data make such a difference for the evaluation time?

    Georg

  • Thursday, August 02, 2012 2:03 PM
     
     

    I'm trying to follow what you say.    You mean, without the employee number, the pivot table performs well?

    From just the numbers you mention, *any* pivot table with 625'000 rows will definitely be slow.    6'000 rows would not be super speedy either.




    Javier Guillen
    http://javierguillen.wordpress.com/

  • Thursday, August 02, 2012 3:20 PM
     
     

    yes - *with* employee number the time is ~5 minutes, *without* employee number the time is ~10 seconds

    Georg

  • Thursday, August 02, 2012 4:48 PM
     
     

    I would argue this is not an issue related to PowerPivot at all.

    Any pivot table displaying 625'000 rows will be slow.   In theory, pivot tables where designed to display aggregate data, not detailed that (their nature is to sum/min/max/avg, etc).   A pivot table is perhaps not the best vehicle for dumping your data into Excel. 

    What is the business case?  If you need that much level of detail, could you query your db directly and skip PowerPivot and Pivot tables?




    Javier Guillen
    http://javierguillen.wordpress.com/

  • Friday, August 03, 2012 12:59 PM
     
     

    Hi Javier

    1. I never said it was an issue with PowerPivot - I was just asking for advice

    2. the business case is that we have the information in the tables summarized in my original post and want to enable users to analyze and present (charts, pivot tables) that information in whatever way they choose - without someone in each case having to find and program the optimal way to do that.

    3. concerning the 625'000 rows - that is exactly part of my original question: how can I tell PowerPivot *not* to do a cross join of projects over employees - the number of rows (combinations of projects and employees) that actually have associated value data is only one percent of that number. If there is no way to do that, telling me so would be an answer.

    Regards - Georg

  • Friday, August 03, 2012 1:13 PM
     
     

    Georg

    ok so here are a couple of pointers:

    When a pivot table queries PowerPivot (they are two separate entities), it does so via MDX.   When placing multiple items on row labels, Excel generally decides to automatically issue an MDX crossjoin statement.  But keep in mind this happens *after the fact* (after the PowerPivot model has been already defined).  PowerPivot has no control over that MDX statement.

     So I keep pointing out the fact that the pivot table is only a 'client' of the PowerPivot data model.  Other clients exist (cube functions, for example).   So if what you want to do is affect the actual display of the model, then your actions may be targeting the pivot table itself and not PowerPivot. 

    You have said that you use the option 'display rows containing no data'.  With this you in effect get a 'cartesian product' of all possible combinations.  I am still confused as to why you need that, given you only want the 1 percent of data that has valid combinations (which is the standard behavior of the pivot table, meaning, the 6 thousand rows)




    Javier Guillen
    http://javierguillen.wordpress.com/

  • Monday, August 06, 2012 10:44 AM
     
     

    Hi Javier

    just for clarification:

    I do not check the 'display rows containing no data' option - I only mentioned this in order to show that the pivot table actually processes the cartesian product of the projects and employees.

    the open questions are still

    - what can I do to avoid the cartesian product being made

    - why would adding a second column from the employee table make such a difference in processing time.

    Thanks for your effort in trying to understand my problem

    Regards - Georg

  • Monday, August 06, 2012 11:01 AM
     
     

    Georg

    Can you email me the workbook so I can take a look at it?  javiguillen@hotmail.com




    Javier Guillen
    http://javierguillen.wordpress.com/