none
Why MDX Queries use multiple partitions to return the result?

    Question

  • Hi,

    I have seen little weird behaviour in Adventure Works DW cube while running certain MDX queries.

    I ran below queries and trace the query execution in SQL Server Profiler:

    select
    [Measures].[Reseller Sales Amount] on 0,
    [Reseller].[Product Line].[Product Line] on 1
    from [Adventure Works]
    where [Date].[Calendar].[Calendar Year].&[2003]

    Trace Detail:

          Started reading data from the 'Reseller_Sales_2003' partition.
          Finished reading data from the 'Reseller_Sales_2003' partition.

    This query only uses 'Reseller_Sales_2003' partition to return the result, which is expected. Now issue came when I ran query for 'Internet Sales Amount' measure i.e.

    select
    [Measures].[Internet Sales Amount] on 0,
    [Customer].[Customer Geography].[Country] on 1
    from [Adventure Works] 
    where [Date].[Calendar].[Calendar Year].&[2003]

    Trace Detail:

         Started reading data from the 'Internet_Sales_2001' partition.
         Started reading data from the 'Internet_Sales_2002' partition.
         Started reading data from the 'Internet_Sales_2003' partition.
         Finished reading data from the 'Internet_Sales_2001' partition.
         Finished reading data from the 'Internet_Sales_2002' partition.
         Finished reading data from the 'Internet_Sales_2003' partition.

    Here, query scans 2001, 2002, 2003 partitions to return the result.

    My question is: why this query is not using only Internet_Sales_2003 to return the result set instead of scanning 2001,2001,2003 partitions until satisfying query requirement.

    Thanks, Surya

    http://ssasbysss.blogspot.com

    Friday, February 04, 2011 6:55 AM

Answers

  • If you run that Discover_Partition_Dimension_Stat command Gerhard posted above for the 2001 partition, you'll notice that the auto-slice info hasn't be populated. Note the Min and Max are 0 for all attributes for that partition. 2003 and 2004, on the other hand, have more than 4096 rows and so get indexed.

    So I guess it's safe to say that auto-slice and indexing are both limited by the IndexBuildThreshold setting.

    Another test to run... run a ProcessData on the 2003 partition, then run the Discover_Partition_Dimension_Stat command. You'll notice that the auto-slice isn't set until the ProcessIndexes has been run.

    A final test to try... put a bogus Slice property on the 2003 partition... for example, put [Date].[Calendar].[Calendar Year].&[2004] as the Slice for the 2003 partition. ProcessData succeeds. Run the query above that asks for 2003 Internet Sales. Note it returns all nulls (because SSAS used the bogus Slice property). But ProcessIndexes fails if you run it.

    So to answer Phillip's question, doing ProcessData alone and forgetting ProcessIndexes could leave yourself in a bad state with a number of different things.


    http://artisconsulting.com/Blogs/GregGalloway
    Monday, February 07, 2011 4:00 PM
    Moderator
  • Auto-slice on the 2003 partition won't prevent it from hitting other partitions, it just helps a query know it doesn't need to hit the 2003 partition. Auto-slice on partition A only helps a query know that it doesn't even need to hit partition A during that query.

    Maybe you didn't run a ClearCache before your MDX query? Maybe that explains why it didn't hit the other partitions? Or did you set the Slice property on the other partitions and forget to remove it for this test?


    http://artisconsulting.com/Blogs/GregGalloway
    Thursday, February 10, 2011 4:35 PM
    Moderator

All replies

  • Which version of Adventure Works DW are you using? (2005/2008). I don't have AW in front of me at the moment but I'm guessing that the partitions for the Reseller Sales measure group are sliced but the partitions for the Internet Sales measure group are not.

    Whilst busy with slicing, if you haven't already, please read this: http://sqlblog.com/blogs/mosha/archive/2008/10/14/get-most-out-of-partition-slices.aspx


    http://bi-logger.blogspot.com/
    Friday, February 04, 2011 8:51 AM
  • Thank You Philip for your reply.

    Adventure Works DW version: 2008

    After reading the blog, it seems that issue is with partition slicing. I am still going through blog and will update incase have any further query.

    Please let me know if you are able to generate the scenario and get solution/reason for that.

    Thanks, Surya 

    http://ssasbysss.blogspot.com
    Friday, February 04, 2011 1:34 PM
  • Set Slice property of "Internet_Sales_2003" partition to [Date].[Calendar].[Calendar Year].&[2003] and process "Internet Sales" measure group. after processing if you execute your mdx query and see profiler,it will show you data read only from 2003 partition.
    Aniruddha
    Friday, February 04, 2011 5:31 PM
  • Yes, clearly setting the slice may alter the behaviour but the question surely is why, by default ,one query hits only one partition when the other hits more than one? Assuming auto-slicing one would surely expect similar behaviour between the two queries when both sets of partitions are not specifically sliced. I briefly tried using other dimensions thinking maybe AS had auto-sliced on something else but I got the same result. The profiler trace doesn't show anything markedly different between the two queries yet AS appears to know more about the data distribution in the Reseller measure group than it does in the Internet measure group.

    http://bi-logger.blogspot.com/
    Friday, February 04, 2011 7:42 PM
  • Set Slice property of "Internet_Sales_2003" partition to [Date].[Calendar].[Calendar Year].&[2003] and process "Internet Sales" measure group. after processing if you execute your mdx query and see profiler,it will show you data read only from 2003 partition.
    Aniruddha


    I applied Slice property of Internet_Sales_2003 partition as [Date].[Calendar].[Calendar Year].&[2003] and re-process "Internet Sales" measure group, but still query read all (i.e. 2001, 2002, 2003) partitions.

    Then, I set Slice property of each partitions (Internet_Sales_2001, Internet_Sales_2002, Internet_Sales_2003, Internet_Sales_2004) under Internet Sales measure group and re-process "Internet Sales" Measure Group, and found that query now only uses "Internet Sales_2003" partition.

    Which means, In order to make MDX query uses only relavant partition, we need to set Slice property of each partition of a Measure Group.

    Thanks Aniruddha for your reply.

    http://ssasbysss.blogspot.com

    Saturday, February 05, 2011 7:02 PM
  • Yes Philip, Similar problem I am facing where I have automated process for cube management which create partitions on daily basis. We mantain 2 weeks partitions on daily basis and previous partitions on monthly basis (until now there are 26 total partitions per measure group). So queries scan more than one partitions and causing poor query performance.

    As per your suggestion I should explicitly set Slice property for each partition. Is there any other way to handle this issue, beacuse setting Slice property for each existing and new partitions is bit more expensive task in my current environment.

    Please suggest your ideas...

    Thanks, Surya

    Saturday, February 05, 2011 7:33 PM
  • Adventure Works is tricky. If you notice, the Internet Sales partitions for the older years have fewer than 4096 rows. There's an SSAS server setting called IndexBuildThreshold. It's defaulted to 4096 (and really shouldn't be changed). Partitions that have fewer rows than that won't be indexed. So, unless you set the Slice property (which is a best practice, like is suggested above), SSAS will scan small partitions on every query hitting that measure group.

    So long story short, just set the Slice property every time.


    http://artisconsulting.com/Blogs/GregGalloway
    Sunday, February 06, 2011 5:33 PM
    Moderator
  • Hmmm, that's interesting. My understanding was that AS indexes were essentially a map of attribute (members) to measures in order to identify exists and nonempty references efficiently . However, what you appear to be saying here is that they are also (exclusively?) used for auto-slicing.

    So does that mean that AS determines the min and max data ID for dimension members in a partition from the indexes, where if an index does not exist the slicing property will not be set? (I know that slicing should always be manually set, and I do, but can we discuss if only for documentation sake? ) By extension then, if you use ProcessData and do not follow this with ProcessIndexes then slicing will not be set either, yes?

    Anybody else see this as a horrible situation to be in? I appreciate that scanning a single partition with less than 4096 rows may be quicker than using indexes but scale that out across a number of partitions and suddenly you start consuming processing threadpool threads, CPU time and IO bandwidth for something that didn't need to happen. I realise that small rowcounts should be placed into a single partition etc. but sometimes enterprise partitioning strategies don't always immediately align with this.

    Either way, is this relationship between indexes and slicing documented anywhere?


    http://bi-logger.blogspot.com/
    Monday, February 07, 2011 8:56 AM
  • the min and max dataids are generated automatically
    check the following DMV-Query to see them per partition

    SELECT *
    FROM SystemRestrictSchema($system.Discover_Partition_Dimension_Stat
    ,DATABASE_NAME = 'Adventure Works DW 2008'
    ,CUBE_NAME = 'Adventure Works'
    ,MEASURE_GROUP_NAME = 'Internet Sales'
    ,PARTITION_NAME = 'Internet_Sales_2003')

    originally from http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1
    where you may find additional information

    hth,
    gerhard


    - www.pmOne.com -
    Monday, February 07, 2011 11:46 AM
    Answerer
  • Yes sure, I realise that the MIN and MAX data IDs are captured automatically otherwise it wouldn't be auto -slicing. ;-)

    However, what isn't made clear in any documentation I've seen is that indexes are required before auto-slicing happens. Even the link in your post refers to indexes with regard to aggregations and not slicing. The best reference I can find (which will admit I have read before but the nuance of this past me by) is this: http://cwebbbi.wordpress.com/2007/11/08/viewing-partition-slice-information/

    In that article it is mentioned that the IndexBuildThreshold property was adjusted to get the slicing to appear. As such, that's a crucial bit of information that seems to be largely undocumented. Information on process indexes and aggregations will be overwhelming I'm sure but not so much on its importance with regards auto-slicing. Sure, anyone who has been using AS for a reasonable period knows to manually set the slice but irrespective, the functionality exists and for the most part may even be useful but knowing that indexes must exist is a crucial bit of information that appears to be lacking.


    http://bi-logger.blogspot.com/
    Monday, February 07, 2011 1:33 PM
  • If you run that Discover_Partition_Dimension_Stat command Gerhard posted above for the 2001 partition, you'll notice that the auto-slice info hasn't be populated. Note the Min and Max are 0 for all attributes for that partition. 2003 and 2004, on the other hand, have more than 4096 rows and so get indexed.

    So I guess it's safe to say that auto-slice and indexing are both limited by the IndexBuildThreshold setting.

    Another test to run... run a ProcessData on the 2003 partition, then run the Discover_Partition_Dimension_Stat command. You'll notice that the auto-slice isn't set until the ProcessIndexes has been run.

    A final test to try... put a bogus Slice property on the 2003 partition... for example, put [Date].[Calendar].[Calendar Year].&[2004] as the Slice for the 2003 partition. ProcessData succeeds. Run the query above that asks for 2003 Internet Sales. Note it returns all nulls (because SSAS used the bogus Slice property). But ProcessIndexes fails if you run it.

    So to answer Phillip's question, doing ProcessData alone and forgetting ProcessIndexes could leave yourself in a bad state with a number of different things.


    http://artisconsulting.com/Blogs/GregGalloway
    Monday, February 07, 2011 4:00 PM
    Moderator
  • I did ProcessData on Internet_Sales_2003 partition then ran Discover_Partition_Dimension_Stat command and found that auto-slice is not set and min & max data ids became 0. Then I ran below MDX

    select [Measures].[Internet Sales Amount] on 0
    from [Adventure Works]
    where [Date].[Calendar].[Calendar Year].&[2003]

    and noticed in profiler that query is using only 2003 partition.

    After doing ProcessData on partition Internet_Sales_2003, why above query is still using only partition 2003, wherein auto-slice is not set for this partition. If auto-slice property is not set for 2003 partition, therefore query should scan other partitions too in order to return the result set.

    I did same for 2002 partition as well and got same behaviour.

    http://ssasbysss.blogspot.com

    Thursday, February 10, 2011 4:01 PM
  • Auto-slice on the 2003 partition won't prevent it from hitting other partitions, it just helps a query know it doesn't need to hit the 2003 partition. Auto-slice on partition A only helps a query know that it doesn't even need to hit partition A during that query.

    Maybe you didn't run a ClearCache before your MDX query? Maybe that explains why it didn't hit the other partitions? Or did you set the Slice property on the other partitions and forget to remove it for this test?


    http://artisconsulting.com/Blogs/GregGalloway
    Thursday, February 10, 2011 4:35 PM
    Moderator
  • the min and max dataids are generated automatically
    check the following DMV-Query to see them per partition

    SELECT *
    FROM SystemRestrictSchema($system.Discover_Partition_Dimension_Stat
    ,DATABASE_NAME = 'Adventure Works DW 2008'
    ,CUBE_NAME = 'Adventure Works'
    ,MEASURE_GROUP_NAME = 'Internet Sales'
    ,PARTITION_NAME = 'Internet_Sales_2003')

    originally from http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1
    where you may find additional information

    hth,
    gerhard


    - www.pmOne.com -

    Great, by reviewing the book/link mentioned above ( http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1), it solved my problem of certain query running slow caused by scanning more partitions, from my profiler trace for a similar query: 

    it read un-requested partitions when the month range spans 3 years. (e.g. [201006]:[201208] or [201001]:[201208])
    It is ok when the query month range span only 2 years or less. (e.g.[201106] or [201106]:[201107] or [201006]:[201101] or [201106]:[201201])

    once I used "Disable Prefetch Facts=True" in the SSMS connection string (option button), it reads proper partitions now.

    cheers

    Min


    -- Let us help each other to continuously improve.

    Tuesday, January 15, 2013 6:08 AM