MSDN >
Home page del forum
>
Data Mining
>
how to rearrange clusters ? manually or programmatically?
how to rearrange clusters ? manually or programmatically?
- ideo.
And having watched it I am treybg to configure the DM models in order to use predictions (engaging the use of MS Clustering algorithm).
During creation of model I see the warning or "help" message:
"Input data will be randomly split into two sets, a training set and a testing set,
based on the percentage of data for testing and maximum number of cases in testing data set you provide.
The training set is used to create the mining model. The testing set is used to check model accuracy."
This is very nice!
Is there any way to switch off the randomness and split it manually?
I am also interested to know whether it is possible to define cluster creation manually or programmatically ?
or rearrange clusters ?
PS
Added later.
I cannot be mute on it.
My client saw Excel 2007 Add-in "Exception highlighting" video.
Having wathed and listened it, he insists that Microsoft Clustering Algorithm arrange clusters according to probabilitits.
I.e. it creates clusters with exceptions (anomalies or outliers).
And he wants to have such clusters...
So, is it possible to satisfy such a wish?
tmoving exceptions to separate cluster(s)?
Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин
Tutte le risposte
- HiTo create your own Testing and training sets then you can use SSIS (or any other method you choose) to split the original dataset into 2. The premise holds though that the two sets of data should be representative of the whole.from this page http://technet.microsoft.com/en-us/library/ms131977.aspxUsing the wizard you will be default get a 70/30 split. You could change that to 100/0Using DM you have to manually specify WITH HOLDOUT (<option>)You can programmatically (API and DMX) specify the algorithm parameter values.
- Regarding the second part of the question and PS.. although you will be able to control behaviour of Microsoft Clustering algorithm to some extent, there are limitations, which you are now probably close to. As far as I know, you are not able with Microsoft Clustering algorithm to define exactly how are clusters created (for example their exact centers) or how the final results will be stored in node structure. To be able to create clustering model completetely to your wishes, I would recommend writing your own clustering plug-in algorithm. Even though creating your own algorithm (writing the code) may complicate things at first, you will be then completely in charge of all those issues you mentioned.Best regardsVladimir Cupal

