MSDN > Home page del forum > Data Mining > how to rearrange clusters ? manually or programmatically?
Formula una domandaFormula una domanda
 

Discussione generalehow to rearrange clusters ? manually or programmatically?

  • giovedì 2 luglio 2009 8.31Guennadiy Vanine Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    ideo.
    And having watched it I am treybg to configure the DM models in order to use predictions (engaging the use of MS Clustering algorithm).

    During creation of model I see the warning or "help" message:
    "Input data will be randomly split into two sets, a training set and a testing set,
    based on the percentage of data for testing and maximum number of cases in testing data set you provide.
    The training set is used to create the mining model. The testing set is used to check model accuracy."


    This is very nice!
    Is there any way to switch off the randomness and split it manually?

    I am also interested to know whether it is possible to define cluster creation manually or programmatically ?
    or rearrange clusters ?

    PS
    Added later.
    I cannot be mute on it.
    My client saw Excel 2007 Add-in "Exception highlighting" video.
    Having wathed and listened it, he insists that Microsoft Clustering Algorithm arrange clusters according to probabilitits.
    I.e. it creates clusters with exceptions (anomalies or outliers).
    And he wants to have such clusters...

    So, is it possible to satisfy such a wish?
    tmoving exceptions to separate cluster(s)?

    Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин

Tutte le risposte

  • giovedì 2 luglio 2009 9.14Allan MitchellMVPMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Hi

    To create your own Testing and training sets then you can use SSIS (or any other method you choose) to split the original dataset into 2.  The premise holds though that the two sets of data should be representative of the whole.

     from this page http://technet.microsoft.com/en-us/library/ms131977.aspx

    Using the wizard you will be default get a 70/30 split. You could change that to 100/0

    Using DM you have to manually specify WITH HOLDOUT (<option>)


    You can programmatically (API and DMX) specify the algorithm parameter values.


  • domenica 12 luglio 2009 15.46Vladimir Cupal Medaglie utenteMedaglie utenteMedaglie utenteMedaglie utenteMedaglie utente
     
    Regarding the second part of the question and PS.. although you will be able to control behaviour of Microsoft Clustering algorithm to some extent, there are limitations, which you are now probably close to. As far as I know, you are not able with Microsoft Clustering algorithm to define exactly how are clusters created (for example their exact centers) or how the final results will be stored in node structure. To be able to create clustering model completetely to your wishes, I would recommend writing your own clustering plug-in algorithm. Even though creating your own algorithm (writing the code) may complicate things at first, you will be then completely in charge of all those issues you mentioned.

    Best regards
    Vladimir Cupal