Modeling: "Time dependant dimension hierarchies"
- Hello all,
Let's say I have an Orders fact table like:
Time ProductId Value
20090101 Prd1 10
20090101 Prd2 5
20090201 Prd1 3
20090301 Prd1 6
20090401 Prd2 4
The ProductCategory table looks like:
PrdId Category StartDate EndDate
Prd1 A 20090101 20090228
Prd1 B 20090301 99991231
Prd2 B 20090101 99991231
What I want is a cube showing the sum of Value that could be drilldown by Category->PrdId and Year->Month hierarchies
For example at "Year, Category" and "Month, Category" the data from the cube should be:
Year Category Value
2009 A 13 (10+3)
2009 B 15 (5+6+4)
Month Category Value
January A 10
January B 5
Febrary A 3
Febrary B 0
March A 0
March B 10 (6+4)
How can I do that with SSAS 2008? I don't know how to "connect" the time dimension with the product dimension. How can I tell to SSAS that the drill up from ProductIds to Categories is time-dependant and it depends on the StartDate and EndDate columns of the ProductCategory table?
I hope the example is clear enough...
Thanks in advance,
Eduard.
Answers
- You will want a surrogate key on the productCategory table and you will want that key to be an extra column on your fact table. I prefer to do this at the physical level and have my ETL process populate the surrogate keys on the fact tables, but you can do this at the DSV level if you want.Replace the ProductCategory table with a named query that looks something like:select cast(startdate as varchar(20)) + cast(enddate as varchar(20)) + prdid + category as ProductKey, Category from ProductCategoryReplace your fact table with a named query that joins to the productcategory table to create the same key:select o.time,o.value, cast(startdate as varchar(20)) + cast(enddate as varchar(20)) + prdid + category as ProductKeyfrom orders o inner join ProductCategory pc on o.productid = pc.prdid and o.time between pc.startdate and pc.enddateObviously replace with the correct syntax...
- Eduard,
SSAS really likes data models that follow the Kimball methodology. In which case, you'll want to use a surrogate key to identify your product rows uniquely. It would look like this:
Productkey PrdId Category StartDate EndDate
1 Prd1 A 20090101 20090228
2 Prd1 B 20090301 99991231
3 Prd2 B 20090101 99991231
Your fact table will look something like this:
Time Productkey Value
20090101 1 10
20090101 2 5
20090201 3 3
20090301 1 6
20090401 2 4
In this way, the multiple prd1 rows will link up to the correct fact row for the appropriate product category.
Once this is done, you should model these relationships in the DSV in your project. When you develop the cube, you will bring in PrdID and Category as attributes in the product dimension. Additionally, you can create User hierarchies to define the drilldown path that you want Category -->Prdid, etc. You do not need to, but it can be helpful to your users to make it simpler to choose attributes in a sensible drilldown.
The above is assuming you have a time dimension table to link up to as well.
hope that helps,
mark- Marked As Answer byEduard R Tuesday, November 03, 2009 3:06 PM
All Replies
- You will want a surrogate key on the productCategory table and you will want that key to be an extra column on your fact table. I prefer to do this at the physical level and have my ETL process populate the surrogate keys on the fact tables, but you can do this at the DSV level if you want.Replace the ProductCategory table with a named query that looks something like:select cast(startdate as varchar(20)) + cast(enddate as varchar(20)) + prdid + category as ProductKey, Category from ProductCategoryReplace your fact table with a named query that joins to the productcategory table to create the same key:select o.time,o.value, cast(startdate as varchar(20)) + cast(enddate as varchar(20)) + prdid + category as ProductKeyfrom orders o inner join ProductCategory pc on o.productid = pc.prdid and o.time between pc.startdate and pc.enddateObviously replace with the correct syntax...
- Eduard,
SSAS really likes data models that follow the Kimball methodology. In which case, you'll want to use a surrogate key to identify your product rows uniquely. It would look like this:
Productkey PrdId Category StartDate EndDate
1 Prd1 A 20090101 20090228
2 Prd1 B 20090301 99991231
3 Prd2 B 20090101 99991231
Your fact table will look something like this:
Time Productkey Value
20090101 1 10
20090101 2 5
20090201 3 3
20090301 1 6
20090401 2 4
In this way, the multiple prd1 rows will link up to the correct fact row for the appropriate product category.
Once this is done, you should model these relationships in the DSV in your project. When you develop the cube, you will bring in PrdID and Category as attributes in the product dimension. Additionally, you can create User hierarchies to define the drilldown path that you want Category -->Prdid, etc. You do not need to, but it can be helpful to your users to make it simpler to choose attributes in a sensible drilldown.
The above is assuming you have a time dimension table to link up to as well.
hope that helps,
mark- Marked As Answer byEduard R Tuesday, November 03, 2009 3:06 PM
Thanks for your responses!
In case we added more levels in the product hierarchy (i mean something like product->category->supercategory->supersupercategory->..., all with time dependencies) what should be your aproach?
Create a big dimension table with one subrogated key for each diferent "expanded" relationship that could exist in a given time?In case the product hierarchy was modeled as a time dependant parent-child realtionship, the approach would be to create first a flat table? With this approach the number of levels should be set in advance... Is there any other aproach for avoid number of levels constraints?
Thanks,
- Hm, it depends on exactly what you are talking about... would each level of the hierarchy be dependent on a different start and end date? If so, then you might be better off creating seperate dimensions for them. But if they are all based on the same start and end date, then just go with the original approach of populating a single surrogate key in the fact table and then expose product, category, supercategory, supersupercategory as attributes of the dimension and set them up in a hierarchy.I am not really sure what your second question is asking... could you expand?
Hm, it depends on exactly what you are talking about... would each level of the hierarchy be dependent on a different start and end date? If so, then you might be better off creating seperate dimensions for them. But if they are all based on the same start and end date, then just go with the original approach of populating a single surrogate key in the fact table and then expose product, category, supercategory, supersupercategory as attributes of the dimension and set them up in a hierarchy.
Ok. To expose product, category and supercategory as diferent dimensions it means the fact table should contain the surrogated key for each
dimension, isn't it? Is it posible using this approach to provide to the user the hierarchy product->category->supercategory when browsing the data? I'm thinking the user is using excel to browse an SSAS cube...I am not really sure what your second question is asking... could you expand?I would try to set an example. Let be a product hierarchy table like
PrdKey ParentPrdKey StartDate EndDate
Prd1 PrdA 20090101 20090131
Prd2 PrdA 20090101 99991231
Prd1 PrdB 20090201 99991231
PrdA PrdI 20090101 99991231
PrdI Prdalfa 20090101 99991231
Some metaexplanation here: Within this example I use 4 levels: the lowest one is Prd1,2,3,4 etc... The second one are PrdA,B,C,D.... The third one is PrdI,II,III,IV, and the last one is Prdalfa, beta, gamma,...
As before we have a Sales fact table at Prd1,2,3,4,... level.
The product hierarchy is Prdalfa->PrdI->PrdA->Prd1.
Instead of having a fix number of levels (4 in this example) I would like to have as many levels as the table contains, without a priori constraints...Thanks
Eduard,
I would recommend adding a column to your product dimension for every attribute that you wish to analyze by. To have 10 columns in a product table (Category, Brand, Product line, Super-category, sub-category, etc.) is not uncommon. The advantage to this design is that all of the product related fields are held together. Alternatively, you could have a mini-dimension for each attribute, but this could be confusing to navigate in the front end tools and there is really only a benefit if the mini-dimension is shared by other fact tables.
Mark- The only problem I see at first glance, is that if we add too many attributes in the dimension table instead of a slow changing dimension we would have a "fast" changing dimension... I guess it is not a good idea to have one natural key with several thousand diferents surrogates keys.
For instance imagine I have 10000 products. Products belong to 100 subcategories (100 products each). Subcategories belong to 20 categories (5 each category) and each category belong to 3 super-categories (10 on A,6 on B,4 on C).
Let's say one category change from belonging to super-category B to belonging to super-category C.
This small change in one single record in the original OLTP system would result in creating about 500 new surrogated keys (1category=5 subcategories=500 products).
It could result in a big and "always growing" dimension table. - Eduard,
You are correct that this could create large dimensional growth. I think it is more of common practice to use Type 1 changes in a product table and type 2 in a customer table. This would depend on your specific business of course. In my business, we sell the same items we have been selling for the past 20 years with additions and deletions each year. Product categories do change, but it is always a reclassification rather than the attributes of the product changing causing a change in its category.
In the example that you give above where 500 products change at once, this sounds more like the category or classification is being refined rather than the products actually changing.
You will know best if it is worth tracking changes on the product table. On the customer side, everyone wants to know who the sales rep was or if the customer changed location or store size or something like that.
Does that make sense?
Mark


