Query PP-Data with PivotTables vs CubeFunctions - Why are CubeFunctions sooo slow?
-
10 octombrie 2010 20:37
Hi Forum,
As you may know there are two ways in excel to query OLAP/PowerPivot Data: PivotTables and Cube Functions!
I don't want to repeat the advantages of Cube Functions for specific business scenarios because there's a lot of information already available on internet here or here...What's not available public is a recommendation/advisory/KB-article.. which talks about the performance limits of cube functions!
Performance limits? Yes! In my experience cube functions don't scale up if you use many of them, neither for accessing external OLAP/SSAS data nor for OLAP data in form of powerpivot embedded in a excel sheet. You can use a few of them for a small one-sheet-demo-dasboard... but you can't use them for real life scenarios where users want to use (cube) formulas in masses...Reproduction (preparation):
- Download this XL-Testfile from my skydrive (5MB) or create your own testfile with demodata from Microsoft Contoso BI Demo Dataset for Retail Industry
- Your testfile should include powerpivot data and one PivotTable (at least 200Rows*200Columns), my sample pivottable has 2519 rows (Products) * 66 columns (Regions) and is based on the Contoso Demo Dataset. I think that is good reference data as i have seen in real life many excel files with more data spread on multiple sheets in one file! ;)
If you have your sample pivottable based on powerpivot data prepared:
-click right on the pivottable>Refresh.. you will see the data refreshed in < 1 Sec = GREAT
-upload this XL-File to SharePoint,display with excel services -> this open in < 1 Sec GREAT, click "Update" Button in toolbar= also fast refreshNow to the problem:
1. Open your testfile again with excel client
2. Click/select the pivot table
3. Choose “OLAP-Tools>Convert to formulas” from PivotTable Tools Ribbon to convert the pivot to cube functions
4. Your excel sheet will now “hang” for > 5Minutes all Cell show “#Getting Data”
5. Safe this file and upload again to sharepoint --> File renderes until timeout reached "The request exceeded the amount of time allowed.It has been canceled"
6. Another small problem is that if you safe this excel-file with the size of the excel-file is nearly doubled (compared to pivot) and it takes very long time to reopen the file!Questions:
- WHY is this performance SO different then showing the same data in a pivottable? As all data is with PP "in-memory" so it should be superfast to query!?
- Where can i get a official statement like "dear users please don't use more then X Cube Functions in a excel workbook"?
- Or even better a sizing guide from SQLCAT.. which shows performance comparison (Pivot<>Formula) which i can handout to customers?Thanks,
JJ
Toate mesajele
-
11 octombrie 2010 22:58
You can take a look at this whitepaper from SQLCAT:However, I looked at your file and it seems that you have a large PivotTable with 167475 cells, composed by almost 2000 rows. You might expect 167475 queries to PowerPivot engine, but Excel is smart enough to reduce that number, by building more complex MDX queries that reduce the number of roundtrips, requiring more cells for each query. This should scale enough when you have some thousands of cells to calculate, but probably with your numbers it is not fast enough.
A PivotTable is so faster because in a single query it gets all the displayed data. This is not possible by using the "CubeFunctions".
Marco
Marco Russo- Propus ca răspuns de Mariano Teixeira Neto [MSFT] 12 octombrie 2010 06:36
-
12 octombrie 2010 07:43
>You might expect 167475 queries to PowerPivot engine, but Excel is smart enough to reduce that number
That's the point. Excel is currently NOT smart enough to analyse this cube functions effective and to understand that the result would be one single set of data and no distinct values to query separat! Would it be really so hard to optimize this batch/caching logic in next version? I think this would be no rocket sience, we worked before with a 10 Year old version of MIS Alea, where all is based on excel functions (dbget/dbset), this ALEA system handles much faster the requests for the given scenario! I even could "help" excel by first query all the data with pivot and than use this nasty getpivotdata and other lookup functions, but
To summarize: I understand that excel is not natively made for this scenario and has its strength when using pivottables instead! But hey, you know endusers? If there's a feature there they want to use it! And it's very frustating if the start to play with a feature just to find out that it doesn't scale up for their target scenario! So my intention of this post was to get a official information to this topic, maybe find outer users which have also problems with intensive usage of cube functions an and maybe to find a possible workaround
-
12 octombrie 2010 15:18
Also instead of using CubeFunction to calculate values you should use DAX function for meausres. If your intention is to do some computations and get results then DAX is the way to go.
Thanks, Deva [MSFT] My blog: http://powerpivot.spaces.live.com/ -
12 octombrie 2010 18:55
thanks devarajan!
one of the main issues customer has is to include blank rows and blank columns for formatting purposes.
For example ProductGroup>Product on rows and after each product group a blank row...
The only way i see to resolve this is to extend/manipulate the sourcedata with the needed blanks? (dimension members and data columns) -
14 octombrie 2010 01:03
Hi JJ78
I have used both PowerPivot/DAX and Cube formulas and I think there is place for each one. Just like in the traditional Analysis Services, which is not optimized for queries with millions of rows at the leaf level, Cube formulas in my opinion should be used only when building reports or Excel dashboards on aggregate data that is not meant to have more than a couple hundreds cells with cube formulas. In most executive level reports that still holds true.
We built a financial reporting Excel application using this methodology (cube formulas connected to PowerPivot) and it performs really well, its fast and highly customized (blanks, alternate calculations for each category, etc)
Javier Guillen
http://www.msbicentral.com/Blogs/JavierGuillen.aspx