none
MDX Query Performance (Subselect, Subcube)

    Question

  • Below is the MDX:

     

    SELECT     [Measures].[Count] ON COLUMNS,                 
                    { [Bucket].[BucketID].CHILDREN } ON ROWS
                    FROM
    (
     SELECT NONEMPTY([App Number].[App Number].CHILDREN, ([Measures].[Count])) ON 0
     FROM [Historical] WHERE ( {[Date].[DateID].[20080116] : [Date].[DateID].[20080416]} ,
     [Bucket].[BucketID].&[1], [Company].[Company].[Test])
    )

     

    The App Number dimension has 14,785,739 records and will grow.

     

    This query process the very slow and has creates high IO on a 16 Processor box 32 Gigs of Ram. Any help in tuning the query is appreciated.

     

     

     

    EvenClass EventSubClass Duration CPUTime
    Query Begin 0 - MDXQuery    
    Query Subcube 2 - Non-cache data 469 1953
    Query Subcube 2 - Non-cache data 27922 429813
    Query End 0 - MDXQuery 33375 429813

     

    Wednesday, June 18, 2008 3:08 PM

All replies

  • Hi,

     

    It looks like that your Subcube Query takes very long time to run, even you don't need to return all the App Number members.  It will help if you can avoid ([App Number].[App Number].CHILDREN expression since it is a large set to evaluate.  What' the relationship between [Bucket] and [App Number] ?

     

     

    Wednesday, June 18, 2008 3:35 PM
  • So, in this scenario the buckets represent a status that applications move through before being completed. So here's the business question:

     

    Of the 20 odd statuses an application moves through, what applications moved into Status D between January 1 and March 1, 2008; and of those, what buckets did they move from and what buckets have they moved to, irregardless of date.

     

    So, the fact table contains duplicate application numbers that all have a date that correlates to when that application reached a given status. One application might reach all 20 statuses, so it would have 20 rows in the fact table. The attempt here was to not have 20 date dimensions, where each fact record would need to contain all 20 possible dates for the statuses we want to track.

     

    The solution we're trying here is give me the application numbers of those that reached Status A between this date range, then use only those keys as the basis for the subcube, so I can then see which other statuses these applications have existed in.

     

    Make sense? It works, it's just extremely intensive when we are using a large chunk of the fact dimension to filter on.

    Wednesday, June 18, 2008 5:18 PM
  • Sound like a tough one.  I can see the performance hit there.  I need to think about this.  In case, you come up some solution.  Let us know.

     

     

    Wednesday, June 18, 2008 6:29 PM
  • I would try designing the following agg using Agg Manager. Actually, just copy the subcube vector (the 1's and 0's) from Profiler into Agg Manager.

     

    [App Number].[App Number]

    [Date].[DateID]

    [Bucket].[BucketID]

    [Company].[Company]

     

    I prefer the one in BIDS Helper (http://www.codeplex.com/bidshelper/Wiki/View.aspx?title=Aggregation%20Manager&referringTitle=Home). That should certainly help reduce the IO. Even if that agg contains nearly as many rows as the fact table itself, I assume there are many more than 4 dimensions for that fact table, so the reduced number of dimension keys in that agg (versus the fact table) should cause the agg to be significantly smaller than the leaf data.

     

    If that won't work, then try partitioning so that the IO scans of multiple partitions can be done in parallel.

    Wednesday, June 18, 2008 6:39 PM
  • It appears adding the aggregation here does help, especially in terms of IO. When bringing in other dimensions against the sub-cube the peformance is somewhat lost though unless I create aggregations against all dimensions the application uses that include the app number dimension. Any other ideas?

     

    Thanks for the assistance.

    Monday, June 23, 2008 6:50 PM
  • I would try partitioning by some date attribute (like month or year). To prototype this partitioning, you might consider using this sproc:

    http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=Partition&referringTitle=Home

     

    Monday, June 23, 2008 7:07 PM
  • Yep; I forgot to mention that I also included monthly partitions.
    Monday, June 23, 2008 8:16 PM
  • Anymore thoughts on this? Creating a subcube based on the filtered results of our fact dimension (let's say basing a subcube on half a million application numbers) is crippling our 16CPU/64GB of RAM server for about 5-10 seconds or so.

    Monday, July 07, 2008 6:59 PM
  • has bucket id anything to do with app number?
    Friday, May 22, 2009 10:29 AM
  • Can you try the following things to see whether has any performance improvement.

    1. Try to create an application category attribute into [App Number] dimension. Each category will hold around 3M [App Numbers], so the total category members will be between 6 and 8.

    *Do not create too many category members; try to keep the members number less then 8

    2. Build relationship between application number and application category and then build a hier[App Hier]  into [App Number] dimension
         App Number->App Category

    3. Build an aggregration design on [App Number], [Bucket ID], [DateID] via BIDs help

    4. Use generate function into your MDX

    SELECT     [Measures].[Count] ON COLUMNS,                 
                    { [Bucket].[BucketID].CHILDREN } ON ROWS
                    FROM
    (
     SELECT GENERATE([App Number].[App Hier].[App Category], EXISTS(EXISTS(NONEMPTY(DESCENDANTS([App Number].[App Hier].CurrentMember, [App Number].[App Hier].[App Number]), [Measures].[Count]), [Bucket].[BucketID].&[1]), [Company].[Company].[Test])) ON 0
     FROM [Historical] 
    WHERE ({[Date].[DateID].[20080116] : [Date].[DateID].[20080416]})
    )

     

    • Edited by Scott.X.Lu Tuesday, May 26, 2009 5:04 AM Change MDX Code
    Tuesday, May 26, 2009 4:56 AM
  • I do have the same issue with subselect. I do feel that there is an issue on Cache warming where Subselects are concern. it might look from a profiler stand point that it does not warm the cache even after multiple executions.

    There is a bug reported to Microsofts Analysis Services Team that regards to this type of issue (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=464256&wa=wsignin1.0)

    I really hope that they make a way for this to be resolved ASAP. Subselects are the closest thing we have for custom filters without it we'll have a hard time working things out the way we want it to be.
    Wednesday, November 11, 2009 7:41 AM
  • Remark!
    Thursday, September 15, 2011 1:38 PM