none
Reducing number of rows in fact tables

    Question

  • Hello

    We have a Kimball style DW with a FactDailyBalancesSnapshot table. The fact table is growing at quite a rate as it is a daily snapshot of every (bank) account. A lot of the accounts, however, have a static balance during the life of their term. These so called Term Accounts are similar to savings accounts, but with with a maturity date. Once started, not much happens to these accounts if held till maturity.

    My question is this: What are the consequnces of us minimising the number of rows in the fact table by only ETL'ing accounts where the daily balance has changed?

    One issue that I can think of is the Analysis Services Cube not being to handle the dates for which there are no balances. The grain of the fact table is daily and there will be missing days - will the Cube magic work/cope with this?

    Thanks in advance.


    Amir
    • Edited by amir tohidi Wednesday, November 09, 2011 3:04 PM Typo
    Wednesday, November 09, 2011 3:01 PM

Answers

  • To conclude this post:

    Based on my findings, although clever techniques can be used at the relational data warehouse level to minimise the number of fact rows, doing so would create problems with the Analysis Services portion of our solution. To that end, we opted to keep things simple and store daily balances for each account. That means 365 rows per account per year etc.

    For us this is not too bad because our fact table has two measures only and 6 foreign keys, so each row doesn't consume a huge amount of space. Besides, I think this is one of those scenarios where "throwing hardware at the problem" as opposed to complex SQL/MDX will be a cheaper option in the future, if performance became an issue. Our DW/BI is running in a VM and we can always allocate more disk, memory or CPU.


    Amir
    • Edited by amir tohidi Tuesday, November 22, 2011 8:45 AM
    • Marked as answer by amir tohidi Tuesday, November 22, 2011 8:45 AM
    Tuesday, November 22, 2011 8:44 AM

All replies

  • One of the options is to create monthly snapshot instead of daily.

    Having missing dates is not the best, because SSAS will not handle it well and in SQL it will not be so simple.

    You may also consider Balance Periodic Snapshot, where you will have two date dimension DateFrom and DateTo.

    Using such in sql is simple.

    In SSAS you can create Fact: Date Range To Date Bridge and create Many to Many relation between Fact and Date.

    But Bridge table will have millions of rows, so it is only beneficial if you have millions of accounts.

    Thursday, November 10, 2011 7:33 AM
  • Thansk Piotr.

    We set the grain of the fact table at daily because the Kimball books recommend atomic level grains. For a bank account, daily balance is the most logical lowest grain. Also, our users want at least weekly data, so we thought for the sake of four extra rows (Monday - Thursday) per account, we might as well go for the more accommodating solution.

    I have checked with the Business again and they want at least one year's worth of data, so we won't be able to get rid of the daily records after a month - we need to wait at least one year.

    The above suggestion doesn't seem straight forward to implement (?) and requires more clever SQL etc. Is this one of those cases where hardware (extra disk, memory, etc) will provide a better solution than the software equivalent?


    Amir
    Thursday, November 10, 2011 4:12 PM
  • >> Kimball books recommend atomic level grains

    Typical bold statement

     

    You can create an accumulating snapshot where the data is aggregating for the current period and then becomes read only at the end of the period. Depends on whether you need the historic data.

    Can also hold just the current period data with more detail and use the previous period snapshot as the starting point.

    Can hold only changes then create the detail in a query (this can be a lot faster in spite of what people say about join performance).

    You can also forget about theory and hold the data in the most efficient way possible and then either create an object when needed or use a view to transform it into the format that the client needs.

    • Proposed as answer by Jerry NeeModerator Saturday, November 12, 2011 6:43 AM
    • Unproposed as answer by amir tohidi Monday, November 14, 2011 4:29 PM
    Friday, November 11, 2011 2:03 PM
  • Not being DW/BI experts, we can only follow what the industry experts recommend! Besides, expert or not, having data at the lowest level of granularity makes a lot of sense as it gives optimal flexibility and avoids the nightmare scenario where the business suddenly ask for daily reports, as opposed to weekly or monthly. So, the statement is not bold, it is common sense.

    As for the rest of the reply, I am still too new to this stuff so it will take me a while to "decode" it.

    At the end of the day, our business users want, at a minimum, weekly balances (though we are ETL'ing daily balances) and 5 years of history. They also absolutely love PivotTables so almost all of our reports will be sourced from SSAS (another Kimball recommendation by the way).

     


    Amir
    Monday, November 14, 2011 4:38 PM
  • Amir -

    You’re on the right track.  Capturing your DW data at the lowest level is the way to go, as both Kimball and Inmon recommend.  This is the best way to prevent the nightmare scenario you mention and enables you to keep your options open for future design/redesign decisions when your organization’s analytical needs change (and of course they will).  However, instead of daily snapshots, I would suggest a slightly different approach.  Use a transaction fact table as your foundation.  Capture all of the underlying historical transactions for the accounts in the transaction fact with all of the desired dimensionality for your snapshots.  Then, you should be able to build the desired weekly snapshot fact table based on the transaction facts.  For whatever historical periods your users need.  If you need daily snapshots down the road, you can rebuild them based on the transaction facts.  Plus a transaction fact table will give your users the added benefit of being able to precisely analyze specific transaction types and cash flows (opens, closures, deposits, withdrawals, etc.), something they most likely can’t do from the balance snapshot. 

    Hope that helps.

    Brent



    Brent Greenwood, MS, MCITP, CBIP      |+++++|      Please mark correct answers
    http://brentgreenwood.blogspot.com
    Sunday, November 20, 2011 3:12 AM
  • Thanks Brent.

    We did consider using transactional fact tables, but dropped the idea because of the huge number of rows involved. The daily balance was the optimal compromise, even though the users have accepted that won't be able to perform the detailed analysis you mentioned.

    As it happens, since writing my original post, one of our executives requested an ad-hoc report asking for the bank's book value on 10 and 16 November (and further detailed stuff). If we didn't have the daily balances (i.e. the grain of our fact table was weekly) then we would not have been able to produce the report (in minutes).


    Amir
    Tuesday, November 22, 2011 8:33 AM
  • To conclude this post:

    Based on my findings, although clever techniques can be used at the relational data warehouse level to minimise the number of fact rows, doing so would create problems with the Analysis Services portion of our solution. To that end, we opted to keep things simple and store daily balances for each account. That means 365 rows per account per year etc.

    For us this is not too bad because our fact table has two measures only and 6 foreign keys, so each row doesn't consume a huge amount of space. Besides, I think this is one of those scenarios where "throwing hardware at the problem" as opposed to complex SQL/MDX will be a cheaper option in the future, if performance became an issue. Our DW/BI is running in a VM and we can always allocate more disk, memory or CPU.


    Amir
    • Edited by amir tohidi Tuesday, November 22, 2011 8:45 AM
    • Marked as answer by amir tohidi Tuesday, November 22, 2011 8:45 AM
    Tuesday, November 22, 2011 8:44 AM