none
4294967296 Tuple limit and 32 bit Architecture

    Question

  • We have repeatedly encountered the 4,294,967,296 tuple limit in our MDX queries, when using the Non Empty Operator on a Cross Join of fields:

     

    Example

     

    WITH

    MEMBER [Date].[DateID].[M1] AS

    AGGREGATE({[Date].[DateID].&[20070101]:[Date].[DateID].&[20070103]})

    SET [M2] AS

    ({[dimension7].[attribute7].[value7]})

    SELECT {[Measures].[count]} ON COLUMNS,

    NON EMPTY {([dimension1].[attribute1].Children,[dimension2].[attribute2].Children,[M2],[dimesnsion3].[attribute3].Children,[dimension4].[attribute4].Children,[dimension5].[attribute5].Children,[dimension6].[attribute6].Children)} ON ROWS FROM MyCube WHERE {([Date].[DateID].[M1])}

     

    Error

     

    The expression contains a function that cannot operate on a set with more than 4,294,967,296 tuples.

     

     

    We have followed Microsoft's best practices for Cube Design, while balancing the needs of our application.

     

    Is it a coincidence that the 4,294,967,296 = 2^32 on the order of 4 GigaBytes and our server is of 32-bit architecture?. Will  scaling up to a 64-bit architecture machine help? Then will our tuple limit be 2^64 which is on the order of  18 ExaBytes or 18 qunintllion bytes?

     

    Thursday, September 6, 2007 1:54 PM

All replies

  • Hi prakal,

     

    the 64 bit would be helpfull for you, but I think, that brut force isn't goot approach.

    Do you really need to join all 6  dimensions? Why you didn't try to use nonemptycross join.

    You would have your query done on the 32bit box as well.

     

    Thursday, September 6, 2007 8:42 PM
  • I don't think 64bit will fix this, I believe the 2^32 is more of an SSAS engine limit, not a platform one. The nonemptycrossjoin is depreciated, but you could nonempty() your attribute sets before crossjoining them.

     

    WITH

    MEMBER [Date].[DateID].[M1] AS

    AGGREGATE({[Date].[DateID].&[20070101]:[Date].[DateID].&[20070103]})

    SET [M2] AS

    ({[dimension7].[attribute7].[value7]})

    SELECT {[Measures].[count]} ON COLUMNS,

    NON EMPTY {NONEMPTY([dimension1].[attribute1].Children,[Measures].[count]),NONEMPTY([dimension2].[attribute2].Children,[Measures].[count]),NONEMPTY([M2], [Measures].[count]),NONEMPTY([dimesnsion3].[attribute3].Children, [Measures].[count]),NONEMTPY([dimension4].[attribute4].Children, [Measures].[count]),NONEMPTY([dimension5].[attribute5].Children, [Measures].[count]),NONEMPTY([dimension6].[attribute6].Children,[Measures].[count] )} ON ROWS

    FROM MyCube

    WHERE {([Date].[DateID].[M1])}

     

    Or, if you had all of these attributes as part of a hierarchy you could just do

     

    WITH

    MEMBER [Date].[DateID].[M1] AS

    AGGREGATE({[Date].[DateID].&[20070101]:[Date].[DateID].&[20070103]})

    SET [M2] AS

    ({[dimension7].[attribute7].[value7]})

    SELECT {[Measures].[count]} ON COLUMNS,

    NON EMPTY {[dimension1].[hierarchy1].Members} ON ROWS

    FROM MyCube

    WHERE {([Date].[DateID].[M1])}

    Friday, September 7, 2007 11:27 AM
    Moderator
  • A recent thread on this forum linked to a Microsoft Support article that might be relevant:

    Error message when you run a complex MDX query that contains the NON EMPTY clause in SQL Server 2005 Analysis Services: "The expression contains a function that cannot operate on a set with more than 4,294,967,296 tuples"

    SYMPTOMS

    When you use a user account to connect to an instance of Microsoft SQL Server 2005 Analysis Services, you receive the following error message when you run a complex MDX query:
    The expression contains a function that cannot operate on a set with more than 4,294,967,296 tuples.
    This problem may occur if the following conditions are true:
    The user account is a member of a role that has dimension security set on a measure.
    The complex MDX query contains the NON EMPTY clause.

    Back to the top

    CAUSE

    This issue occurs because the optimized algorithm for the NON EMPTY clause cannot handle dimension security on measures. Therefore, SQL Server 2005 Analysis Services must use the default algorithm for the NON EMPTY clause. If the query is complex, you may receive the error message that the "Symptoms" section mentions.
    • Proposed as answer by Boyan Penev Friday, November 27, 2009 11:35 AM
    Friday, September 7, 2007 2:26 PM
    Moderator
  • Darren,

    Thanks for your reply. I tried the query in your first solution,  but I get the error "Two sets specified in the function have different dimensionality".

    Which I guess answers your second solution, all the attributes in the query are a part of different dimensions and cannot be set as a hierarchy of one dimension.

     

    Friday, September 7, 2007 6:39 PM
  • Sorry, I overlooked the detail that there were different dimensions involved . You're right, this would negate using a user hierarchy.

     

    Using NonEmpty on each attribute should work, maybe there was a bracket out of place.  Normally I would use an explicit nested crossjoin() functions or the * operator for crossjoining multiple sets. Maybe start with 2 sets and build up to all 6.

     

    WITH

    MEMBER [Date].[DateID].[M1] AS

    AGGREGATE({[Date].[DateID].&[20070101]:[Date].[DateID].&[20070103]})

    SET [M2] AS

    ({[dimension7].[attribute7].[value7]})

    SELECT {[Measures].[count]} ON COLUMNS,

    NON EMPTY {

      NONEMPTY([dimension1].[attribute1].Children,[Measures].[count])

    * NONEMPTY([dimension2].[attribute2].Children,[Measures].[count])

    } ON ROWS

    FROM MyCube

    WHERE {([Date].[DateID].[M1])}

    Friday, September 7, 2007 10:44 PM
    Moderator
  •  

    This is all well and good but my users are using ProClarity, where I have no control of their reports

     

    The following query errors with 4,294,967,296 tuples

     

    SELECT { [Measures].[Total Sales Credits] } ON COLUMNS ,

    { { { [Team].[Team Level 4].[All].CHILDREN } * { [Product].[Product Level4].[All].CHILDREN } * { [Product].[Product Level5].[All].CHILDREN } * { [Team].[Employee].[All].CHILDREN } } } ON ROWS

    FROM [Ad-Hoc Sales]

    WHERE ( [Date].[Calendar Date].[Month Name].&[200809], [Team].[Team Level 1].[Distribution] )

     

    Re-ordering the dims works though...

     

    SELECT { [Measures].[Total Sales Credits]} ON COLUMNS ,

    NON EMPTY { { { [Team].[Team Level 4].[All].CHILDREN } * { [Team].[Employee].[All].CHILDREN } * { [Product].[Product Level4].[All].CHILDREN } * { [Product].[Product Level5].[All].CHILDREN } } } ON ROWS

    FROM [Ad-Hoc Sales]

    WHERE ( [Date].[Calendar Date].[Month Name].&[200809], [Team].[Team Level 1].[Distribution] )

     

    This returns in 3 seconds and returns 6521 rows.

     

    I guess that in the first query is performing Count of dim1 * Count of dim 2 * Count of dim 3 * Count of dim 4 whilst the second can understand that the teams and products are linked.

     

    This is going to be unpleasant when I tell the users....

     

    Mack

    Wednesday, October 29, 2008 11:33 AM
  • ".. I guess that in the first query is performing Count of dim1 * Count of dim 2 * Count of dim 3 * Count of dim 4 whilst the second can understand that the teams and products are linked. .." - if you study this recent blog entry, the reason why re-ordering works could be that all hierarchies of a given dimension get grouped together. In the original crossjoin, the 2 hierarchies of the [Team] dimension are interleaved with 2 hierarchies of the [Product] dimension (note that there are 4 hierarchies, but only 2 dimensions involved):

     

    Optimizing order of sets in MDX crossjoins

    ...

    What happens here is that crossjoin detects that the set that it needs to apply autoexists to, can be nicely split into two parts – first part related to Customer dimension, and second part to Product dimension. Then it can apply the following formula

    JOIN( c1, c2, c3, p1, p2, p3 ) = CROSSJOIN( INNERJOIN(c1,c2,c3), INNERJOIN(p1,p2,p3) )

    I.e. it can run inner join against each dimension table separately, and then do full cross join between results. Inner join against single dimension table is very efficient since even in worse case, we don’t need to go deeper than the dimension key to which all other attributes relate – so this isn’t going to take additional memory or much time.

    However, in second case, the equation doesn’t hold anymore, since the order of dimensions inside crossjoin is mixed. So some other, much less efficient algorithm will be needed, probably loop join, with crossjoin now being pushed to the earlier stages. This is very inefficient and also going to take lots of additional memory.

     

    Conclusion: Inside crossjoin, cluster all the sets from the same dimension together – this will result in great benefit to performance.

    Wednesday, October 29, 2008 2:41 PM
    Moderator
  • ".. I guess that in the first query is performing Count of dim1 * Count of dim 2 * Count of dim 3 * Count of dim 4 whilst the second can understand that the teams and products are linked. .." - if you study this recent blog entry, the reason why re-ordering works could be that all hierarchies of a given dimension get grouped together. In the original crossjoin, the 2 hierarchies of the [Team] dimension are interleaved with 2 hierarchies of the [Product] dimension (note that there are 4 hierarchies, but only 2 dimensions involved):

     

    Optimizing order of sets in MDX crossjoins

     

     

    ...

    What happens here is that crossjoin detects that the set that it needs to apply autoexists to, can be nicely split into two parts – first part related to Customer dimension, and second part to Product dimension. Then it can apply the following formula

    JOIN( c1, c2, c3, p1, p2, p3 ) = CROSSJOIN( INNERJOIN(c1,c2,c3), INNERJOIN(p1,p2,p3) )

    I.e. it can run inner join against each dimension table separately, and then do full cross join between results. Inner join against single dimension table is very efficient since even in worse case, we don’t need to go deeper than the dimension key to which all other attributes relate – so this isn’t going to take additional memory or much time.

    However, in second case, the equation doesn’t hold anymore, since the order of dimensions inside crossjoin is mixed. So some other, much less efficient algorithm will be needed, probably loop join, with crossjoin now being pushed to the earlier stages. This is very inefficient and also going to take lots of additional memory.

     

    Conclusion: Inside crossjoin, cluster all the sets from the same dimension together – this will result in great benefit to performance.


    Thanks for this tip. It fixed the error and cut query execution time in half.

    For those who are missing the syntax...Hope I understand this correctly :)

    * = Cross Join
    , = Inner join
    E.g.  (( [Calendar].[Fin Year].[Fin Year].ALLMEMBERS , [Calendar].[Month Name].[Month Name].ALLMEMBERS) *
    ([Stores].[Store Reporting Region].[Store Reporting Region].ALLMEMBERS , [Stores].[Store Name].[Store Name].ALLMEMBERS) )


    Grrrr
    Thursday, November 11, 2010 11:22 AM
  • An other way to fix the issue is to change the order of cross join(use the perfect hierarchy) Higher to lower grain. If you already have a junk dimension, the number of result tuples will be cut to a reasonable count.
    Tuesday, February 21, 2012 8:19 PM
  • Just a quick note for anyone ending up here recently.

    In January 2011 Microsoft issued a fix for an issue that generates this error as part of a Cumulative update for various versions of SQL Server 2008 (SP1, SP2, and R2).

    Link Here - http://support.microsoft.com/kb/2430162

    Tuesday, November 27, 2012 10:07 AM