Friday, July 31, 2009 3:01 PMHello,
Our datawarehouse & cube have been designed without the notion of surrogate keys. I'm in the process of implementing SK. From what I've read, you define SK in dimension and use this key in the Fact table. I've successfully replace two business keys so far. But now, I'm facing soft of an existential crisis...
What about dimension which use business keys from another dimension? What is the recommended approach? Replace the related business key by the surrogate key or keep the business in the second dimension?
For example, we have a company dimension. Simple table.
Than we have an item table, items are different from one company to the other.
My item table would move from this structure
to this structure
However, in the cube, the key attribute for the item dimension is referring to the company code. Logically, I should replace this key by the SK_Item (instead of companycode + itemCode)
Unfortunetaly, there are calculations in the cube as well which referes directly to specific item thru MDX, like: [Item dimension][Item key attribute].&[apn].&.
BANG! it breaks during the cube processing! Furthermore, there are some Excel speadsheet running around referring to this attribute.
Any advice is welcomed,
Friday, July 31, 2009 3:55 PMThere's nothing wrong with having both the surrogate key as well as the business key. However, one issue you will have to deal with is which surrogate key to use - because the key can change over time, as the "referred" dimension has version changes.
For now, it appears as though you can not remove the business key, as your users are currently using it. There's no significant downside to retaining it.
The primary reasons to remove business keys from fact tables don't tend to apply to dimension tables:
1. Space reduction. Because dimension tables are "wide and short" (lots of columns, few rows), they take relatively less space, and keeping a "fat" business key reference in them doesn't add much. Fact tables are "tall" (lots of rows), so any fat you can trim from the width pays off huge.
2. Join performance. Joining on integers is fast, because they're small. It's not often that dimension tables are joined together, and even if they are, they're "short" tables to join. Joins to the fact table need performance because they're HUGE.
- Marked As Answer by arkadia Friday, July 31, 2009 5:20 PM
Friday, July 31, 2009 5:20 PMThanks Todd,
I'll revise my project then and be less aggressive on implement SK in dimensions; I'll stick to the fact!