locked
Cascading many to many and Performance issues RRS feed

  • Question

  • Hi,

    I have a model with a cascading many to many setup as such:

    Fact --> Date <-- Fact_DateRange --> Dim_DateRange <-- Fact_AnotherBridge --> Dim

    Fact has roughly 5m rows in it.

    Date is a date dimension with more than 80 years of data in it... (30k rows).

    Fact_DateRange has 120m rows in it (together with DimDateRange it dissolves date ranges from Fact_AnotherBridge into discrete dates).

    Dim_DateRange (50k rows)

    Fact_AnotherBridge (200k rows)

    Dim (20k rows)

    in Fact I've defined a measure like this:

         CALCULATE(
            CALCULATE (
                SUM(Fact[Measure]),
                SUMMARIZE(Fact_DateRange, Date[Dim_Date_Code])
            ),
            SUMMARIZE(Fact_AnotherBridge, Dim_DateRange[Dim_DateRange_Code])
        )

    It works but performance is poor, which I kinda expected. Does anyone have experience with or suggestions as to how to make this perform better?

    The only solution I can come up with is to reduce the volume which I'd rather not do :)


    Thanks

    Monday, June 17, 2013 2:42 PM

Answers

  • Hi. Mortenbpost.

    I don't have opportunity to test calculate M2M for big data. Beside SUMMARIZE formulas, there are many ways to calculate M2M. But I don't know whether is better or not. You can try to test formulas as below:

    1. M2M

    CALCULATE(
       CALCULATE(SUM(Fact[Measure]),
           Fact_DateRange),
       Fact_AnotherBridge
    )

    2. M2M:

    CALCULATE(
      SUM(Fact[Measure]),
      CALCULATETABLE(
        SUMMARIZE(Fact_DateRange,Date[Dim_Date_Code]),
        SUMMARIZE(Fact_AnotherBridge,Dim_DateRange[Dim_DateRange_Code])
      )
    )

    3. M2M:

    CALCULATE(
        CALCULATE(
           SUM(Fact[Measure]),
           FILTER(
              VALUES(Date[Dim_Date_Code]),
              CALCULATE(COUNTROWS(Fact_DateRange))
           )
        ),
        FILTER(
           VALUES(Dim_DateRange[Dim_DateRange_Code]),
           CALCULATE(COUNTROWS(Fact_AnotherBridge))
        )
     )

    4. M2M

    CALCULATE(
       CALCULATE(
          SUM(Fact[Measure]),
          FILTER(
             VALUES(Date[Dim_Date_Code]),
             COUNTROWS(RELATEDTABLE(Fact_DateRange))
          )
       ),
       FILTER(
          VALUES(Dim_DateRange[Dim_DateRange_Code]),
          COUNTROWS(RELATEDTABLE(Fact_AnotherBridge))
       )
    )

    5. M2M:

    CALCULATE(
        CALCULATE(
           SUM(Fact[Measure]),
           FILTER(
              VALUES(Date[Dim_Date_Code]),
              COUNTROWS(CALCULATETABLE(Fact_DateRange))
           )
        ),
        FILTER(
           VALUES(Dim_DateRange[Dim_DateRange_Code]),
           COUNTROWS(CALCULATETABLE(Fact_AnotherBridge))
        )
    )

    I hope this help with you. Let' us know which is the best performance.

    Regards,

    • Marked as answer by mortenbpost Sunday, June 23, 2013 11:38 AM
    Tuesday, June 18, 2013 6:36 AM
  • Unfortunately, there is not much that you can do. Cascading m2m has a geometric complexity and as numbers grow, the speed decreases dramatically. Don't expect to see stellar performance on this model. Well, you are already experiencing poor ones, so no surprise here. :)

    The only option is, IMHO, to try to modify the model. Either you flatten the cascading m2m into a single bridge, reducing the number of joins at query time, or you work on the data model trying to kill the cascading m2m. Looks like an interesting model, can you share more info about its functional scope?

    Alberto


    Alberto Ferrari
    http://www.sqlbi.com

    • Marked as answer by mortenbpost Sunday, June 23, 2013 11:38 AM
    Wednesday, June 19, 2013 9:59 AM

All replies

  • Hi. Mortenbpost.

    I don't have opportunity to test calculate M2M for big data. Beside SUMMARIZE formulas, there are many ways to calculate M2M. But I don't know whether is better or not. You can try to test formulas as below:

    1. M2M

    CALCULATE(
       CALCULATE(SUM(Fact[Measure]),
           Fact_DateRange),
       Fact_AnotherBridge
    )

    2. M2M:

    CALCULATE(
      SUM(Fact[Measure]),
      CALCULATETABLE(
        SUMMARIZE(Fact_DateRange,Date[Dim_Date_Code]),
        SUMMARIZE(Fact_AnotherBridge,Dim_DateRange[Dim_DateRange_Code])
      )
    )

    3. M2M:

    CALCULATE(
        CALCULATE(
           SUM(Fact[Measure]),
           FILTER(
              VALUES(Date[Dim_Date_Code]),
              CALCULATE(COUNTROWS(Fact_DateRange))
           )
        ),
        FILTER(
           VALUES(Dim_DateRange[Dim_DateRange_Code]),
           CALCULATE(COUNTROWS(Fact_AnotherBridge))
        )
     )

    4. M2M

    CALCULATE(
       CALCULATE(
          SUM(Fact[Measure]),
          FILTER(
             VALUES(Date[Dim_Date_Code]),
             COUNTROWS(RELATEDTABLE(Fact_DateRange))
          )
       ),
       FILTER(
          VALUES(Dim_DateRange[Dim_DateRange_Code]),
          COUNTROWS(RELATEDTABLE(Fact_AnotherBridge))
       )
    )

    5. M2M:

    CALCULATE(
        CALCULATE(
           SUM(Fact[Measure]),
           FILTER(
              VALUES(Date[Dim_Date_Code]),
              COUNTROWS(CALCULATETABLE(Fact_DateRange))
           )
        ),
        FILTER(
           VALUES(Dim_DateRange[Dim_DateRange_Code]),
           COUNTROWS(CALCULATETABLE(Fact_AnotherBridge))
        )
    )

    I hope this help with you. Let' us know which is the best performance.

    Regards,

    • Marked as answer by mortenbpost Sunday, June 23, 2013 11:38 AM
    Tuesday, June 18, 2013 6:36 AM
  • Hi, 

    Thanks a lot for some very good answers. I've tried all of them. What I find odd is that the two first queries, the one I posted and the first of yours both execute in 35s.

    While the rest of the queries, 2-5 all execute in 25s (so there is a slight speed improvement here.). I wonder if there is a way to get to the bottom of what it's doing the 25s it's running using the query plan or profiler.

    It would also be nice to be able to check if the table Fact_AnotherBridge is actually crossfiltered - because there is no need to include it if it isn't. But there isn't a way to check this for a table??

    ---

    Now, it's clearly the Fact_DateRange that's causing the issues. If I take 2. M2M formula and switch the order of the two summarize it executes in less than 3 seconds. I guess I would actually need two nested calculates in order for my cascading m2m behavior to work..

    • Edited by mortenbpost Tuesday, June 18, 2013 10:07 AM
    Tuesday, June 18, 2013 8:54 AM
  • Hi. Mortenbpost,

    Thanks for sharing such a useful information.

    Tuesday, June 18, 2013 1:37 PM
  • After further testing I can tell you that the order does matter, and when you have a cascading m2m relationship you need to use nested calculate(table) where you put the bridge away from the fact in the outer calculation.

    So unfortunately the fast calculation didn't work. In my case 3,4,5 worked the best.

    Tuesday, June 18, 2013 1:47 PM
  • Thanks
    Tuesday, June 18, 2013 3:09 PM
  • Unfortunately, there is not much that you can do. Cascading m2m has a geometric complexity and as numbers grow, the speed decreases dramatically. Don't expect to see stellar performance on this model. Well, you are already experiencing poor ones, so no surprise here. :)

    The only option is, IMHO, to try to modify the model. Either you flatten the cascading m2m into a single bridge, reducing the number of joins at query time, or you work on the data model trying to kill the cascading m2m. Looks like an interesting model, can you share more info about its functional scope?

    Alberto


    Alberto Ferrari
    http://www.sqlbi.com

    • Marked as answer by mortenbpost Sunday, June 23, 2013 11:38 AM
    Wednesday, June 19, 2013 9:59 AM
  • Thanks again for the replies. 

    I'm looking into reducing the number of rows in the bridge tables. But I've also come up with another solution that actually works.

    Instead of going through the date range tables, I filter based on the dates in the "fact" that contains the date ranges that are exploded. It looks like this:

    A:=
    CALCULATE(
    	SUM(Fact[Measure]);
    	FILTER(BridgeFact; 
    			(BridgeFact[DW_StartDate] < MIN(Date[Date]) && MAX(Date[Date]) + 1 <= BridgeFact[DW_EndDate]) ||
    			(MIN(Date[Date]) < BridgeFact[DW_EndDate] && BridgeFact[DW_EndDate] <= MAX(Date[Date]) + 1) ||
    			(MIN(Date[Date]) <= BridgeFact[DW_StartDate] && BridgeFact[DW_StartDate] < MAX(Date[Date]) + 1)			
    	)
    )

    This actually works pretty well and performs acceptable. However, it doesn't sum up correctly on month, quarter, years, etc. In order to fix this I've used SUMX (Ugh):

    B:=
    IF(ISFILTERED(Date[Date]);
    	[A]
    	;
    	SUMX(VALUES(Date[Date]); [A])
    )

    Now, this doesn't perform at ALL. Remember I have 80 years in my Date dimension (30k rows) and 200k rows in the BridgeFact.

    Basically the FactBridge describes periods of time where persons can assume a "Role". This role changes over time. Now if I could avoid the SUMX i think I'd almost be happy.


    Friday, June 21, 2013 2:31 PM
  • Morten, if you want (and can), send me a copy of the workbook, if possible with some data inside. I'd like to take a look at it and experiment a bit.


    Alberto Ferrari
    http://www.sqlbi.com

    Friday, June 21, 2013 6:51 PM
  • I will see what I can do.

    Thanks again both of you.

    Sunday, June 23, 2013 11:38 AM