Data warehouse schema design
-
03 April 2012 7:14
Hi experts,
I'm currently designing a data warehouse that contains a freight fact table - the freight fact table contains shipment information such as total kilograms, total cubic, basic charge, fuel charge, weight charge etc.
There will be approximately 50,000 records loaded per day intitially, with potential to grow to 200,000 per day over the next 24 months.
There are related measures in my source transactional database (e.g. transporter invoice basic charge, transporter invoice fuel charge, transporter invoice weight charge etc.) - which i would like to include in my data warehouse. I need advice on the best method to achieve this. I am tossing up between creating a new fact table to store my 'transporter invoice' facts or adding the facts to my existing freight fact table (note, all dimensions are shared).
Considerations for expanding my existing freight fact table:
1. An invoice will not always exist for each shipment - therefore I would need null (or zero?) columns in my freight fact table
2. My transporter invoice facts are available much later than my freight fact table facts - requiring large update processes in my ETL
Considerations for adding a new transporter invoice fact table:
1. All dimensions are shared between the two fact groups - is this duplication bad practice in DW design?. Indeed, the transporter invoice table is directly related to the freight fact through the shipment number (consignment note number).
I'm leaning towards the new invoice fact table. Is there a third option?
I'm sure 'it depends' can be easily applied here, but any advice would be much appreciated.
Thanks.
Semua Balasan
-
03 April 2012 9:57
1. All dimensions are shared between the two fact groups - is this duplication bad practice in DW design?. Indeed, the transporter invoice table is directly related to the freight fact through the shipment number (consignment note number).
This is not. Infact, you can maintain data at different granularity using 2 different fact tables. Also, join only necessary dimensions that are required for the other fact table
As far as the other questions are concerned, tell more about the existing fact table design and wheter point 1 and 2 are inplace in the existing design.
Please vote as helpful or mark as answer, if it helps
Cheers, Raunak | t: @raunakjhawar | My Blog -
05 April 2012 0:31
Thanks for the reply. This is my issue Raunak - my requirement for this Transporter Invoice data is at the exact same grain with the exactly the same dimensions.
The fact that it is available at a different time is what is driving my decision.
I think I'll go with a second fact table - meaning i'll duplicate the 60 bytes worth of FKs for each record.
Thanks again for your reply.
- Disarankan sebagai Jawaban oleh Jerry NeeModerator 10 April 2012 6:01
- Ditandai sebagai Jawaban oleh clay123123123 10 April 2012 6:52
-
10 April 2012 21:12
Hey, Clay, does this mean you'll have a 1:1 relationship between these two fact tables? I would frown on something like that. Is your basic issue that you don't want to do updates when the new data arrives? The related records should be pretty easy to find. Are you using an ETL tool, like SSIS?
-- Ben
Ben
-
06 Agustus 2012 21:09
Try the new data warehouse schema: Spider Schema. It allows you to associate Dimensions to a Fact Table through an Intermediate Dimension which reduces the amount of data in your data warehouse and does not require you to store the Foriegn Keys in the Fact table.
More can be found here: http://spider-schema.info
- Diedit oleh Mark Hargraves 06 Agustus 2012 21:09