I have some confusion regarding the design of case and nested table in Mining Model. The tutorials and the book " Data Minign with SQl Server 2005" by Jamie demonstrates that the case and Nested table design can be implemented in case of One to Many relationship from realtional database however there is no mention of many to many realtionship. I was wondring whether the case and nested table can be dsigned from the many to many relation.
For instance in my case the realtional table design resembles typical star schema where
there are threedimension: i) Product:[ productid, productname]
ii) Brand: [ Brand id, brandname, other brand attributes]
iii) Outlet: [ Outlet id, Outlet name like department , pharmacy etc]
and fact table Transaction: that maps all three many to many realtionship with quantity sale. the schema is [transactionid, productid, brandid, outletid, quantity sale]
now when I try to implement clustering that groups the similar transaction according to the outlet, product and brand. I could not design the case and realtional table as all the dimension have many to many realtionship with each other.
Note: I have designed the mining model using the OLAP cube derived from the given tables but couldn't do it directly from Relational table.
Thanks in advance
I like to try to answer questions that no one has attempted to answer. You are asking about a many-to-many relational model, and creating data mining models.
The good news is that whatever you have modeled in SQL Server using the Unified Dimensional Model can be put into data mining. You mention that you want to do clustering which is a unsupervised model approach which can give you rather random results depending on the order of the original data.
The examples you mention from the tutorials and from the 2005 book have to do with market basket analysis (FYI: the 2008 book is available, and it's good too).
All of the algorithms can provide insight into your transactions, even the time series analysis can provide information that you might not have now. However, you might have to prepare your data using SSIS or some other transformation before you go forward. In many cases, denormalizing the table structure provides the best way to apply most of the data mining algorithms. Normalization is important for transactional systems for performance and maintainability, but data analysis is many times better served using denormalization.
The place to start is with your business questions. What questions are you trying to answer using data mining? I am not convinced, for example, that you are best served with an unsupervised model (like clustering).
Mark Tabladillo MCT, Microsoft (SQL Server Business Intelligence, Data Mining) and SAS Consultant Atlanta, GA http://www.marktab.com
Thank you so much with reply. Actually I am going through the "Marco Russo: Advanced Dimensional Modeling with SQL Server " for different many to many cases. Also regarding my problem yes, time series would have been the best appraoch to forcast future sales in transaction. But I am trying to test data mining model in static cube rather than in SIS, and only two algorithm available to me is clustering and Decsion Tree. That's why I was focusing in clustering more. And more over in static cube (.cub) file the mining model couldn't be generated through OLAP cube by executing XMLA. It needed to be genratred from relational model itself. That was the problem.
Anyway thanks once again
I just joined this site and though my answer is very late, here are my thoughts. The schema you mention has 3 dimensions in a 1 : M relationship to the fact. I do not believe there is a M : M relationship across the dimensions. It may be that there is a Cartesian product across the 3 dimensions but since a Cartesian product is Any-to-Any (not Many-to-Many), there is no need to create an associative table connecting the three. Actually, to do so is more work than just joining them dynamically. It is only necessary to store a Cartesian product (1) if there is an attribute that describes the combination; or (2) in some special cases, to improve performance by pre-joining the members of the Cartesian product. Data can easily be pulled from the Sales Fact by Product, by brand and by outlet, and fed to data mining as an extract.
However, let us say as an example that a product can be in many brands (not all, some but many) and vice versa. In RDBMS, this can be supported by a bridge table, as follows: Product - Product Brand - Brand, with the keys as you have them above. Product Brand is the bridge with a key of Product ID, Brand ID. For the same thing in an OLAP cube structure, (I think) you have to use 2 cubes: one for the real measure (quantity sale – which, by the way is better named Sale Quantity; Quantity is a class word) with all 3 dimensions. One for the relationship between product and brand. This last one is a factless measure. Put a “1” in the attribute column to indicate where a product belongs to a given brand. You can then create a virtual cube between these 2 cubes. However, Product must be identical in both cubes (as least the key must be identical). But from what I read in your question, I do not think you need a bridge.