I have a fact table that has customer subscription length as a measure. I need to have a dimension which groups the subscription lengths into buckets such as 0-12 moths, 12-24 months, 24-48 months etc. Can something like this be accomplished using the "Clusters" dsicretization method that uses the data mining algorith to create groups of similar members? I know how to create the groups manually in the relational db if I need to, but I want to see where the clusters are naturally occuring using the data mining algorithm.
- Edited by jschroeder Friday, July 27, 2012 2:57 PM
Discretization is the process of putting values into buckets so that there are a limited number of possible states. The buckets themselves are treated as ordered and discrete values. You can discretize both numeric and string columns.
There are several methods that you can use to discretize data. If your data mining solution uses relational data, you can control the number of buckets to use for grouping data by setting the value of the DiscretizationBucketCount property. The default number of buckets is 5.
If your data mining solution uses data from an Online Analytical Processing (OLAP) cube, the data mining algorithm automatically computes the number of buckets to generate by using the following equation, where n is the number of distinct values of data in the column: Number of Buckets = sqrt(n)
If you do not want Analysis Services to calculate the number of buckets, you can use the DiscretizationBucketCount property to manually specify the number of buckets.
For more details about it and how to use it, please see: