Microsoft Developer Network > Página principal de foros > Data Mining > column names of DM model (BOL2008)
Formular una preguntaFormular una pregunta
 

Respondidacolumn names of DM model (BOL2008)

  • miércoles, 01 de julio de 2009 10:59Guennadiy Vanine Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     

    I follow BOL2008 
    (SQL Server 2008 Books Online (April 2009))
     Tutorials...

    Data Mining Tutorials and How-to Topics > Tutorials: Using DMX > Bike Buyer DMX Tutorial 

    1)
    Why do we have to explicitly determine columns for [Decision Tree] Model creation
    1a)
    ALTER MINING STRUCTURE [Bike Buyer]
    ADD MINING MODEL [Decision Tree]
    (
       CustomerKey,
       [Age],
       [Bike Buyer] PREDICT,
       [Commute Distance],
       [Education],
       [Gender],
       [House Owner Flag],
       [Marital Status],
       [Number Cars Owned],
       [Number Children At Home],
       [Occupation],
       [Region],
       [Total Children],
       [Yearly Income]
    ) USING Microsoft_Decision_Trees
    WITH DRILLTHROUGH

    and  NOT for  [Clustering] model
    1b)
    ALTER MINING STRUCTURE [Bike Buyer]
    ADD MINING MODEL [Clustering]
    USING Microsoft_Clustering
    ?

    1c)
    Is it possible to omit column names in 1a) too? 

    Anyway it is impossible to use the different column names from those indicated during cration of mining structure.

    2)
    How can one change the names to differing from those (created in mining structure)? 
    by script?


    Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин

Respuestas

  • jueves, 02 de julio de 2009 9:12Allan MitchellMVPMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     Respondida
    And this page helps

    http://technet.microsoft.com/en-us/library/ms132066.aspx


    "If the model does not require a predictable column, for example, models that are built by using the Microsoft Clustering and Microsoft Sequence Clustering algorithms, you do not have to include a column definition in the statement. All the attributes in the resulting model will be treated as inputs."


Todas las respuestas

  • miércoles, 01 de julio de 2009 21:18Allan MitchellMVPMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     Tiene código
    Hi 1b)
    So I guess you didn't try to create a model from a structure without specifying the column names. Here is what happens when I did
    ALTER MINING STRUCTURE [Targeted Mailing] ADD MINING MODEL ZZY USING Microsoft_Decision_Trees
    
    Executing the query ...
    Error (Data mining): Error validating attributes for the 'ZZY' mining model.
    Error (Data mining): The algorithm requires at least one predictable attribute. None found in mining model, ZZY.
    
    Execution complete
    
    2.

    Not sure this is possible in the UI (Couldn't see it anyway) but yes your model column names can be different

    ALTER MINING STRUCTURE [Targeted Mailing] ADD MINING MODEL ZZY2 ([Customer Key], [Gender] as [____], Region as [Where I live]) USING Microsoft_Clustering
    


  • miércoles, 01 de julio de 2009 21:19Allan MitchellMVPMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     
    Gender has a weird alias there because I think the UI to the forum may have taken exception to my naming.  You can guess what it was.
  • jueves, 02 de julio de 2009 8:12Guennadiy Vanine Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     Tiene código
    So I guess you didn't try to create a model from a structure without specifying the column names. Here is what happens when I did
    ALTER MINING STRUCTURE [Targeted Mailing] ADD MINING MODEL ZZY USING Microsoft_Decision_Trees 
    I guess I underspecified the question (though it seems clear what I implicitly asked).

    While creating DM model (or sructure with model) in BIDS AS project the wizard does not permit to pass without marking at least one checkbox in Predictable column (this tep and Window have heading " Data Mining Wizard/ Specify the Training Data" ), this is during creation

    Meanwhile DMX script permits creation of model without predictable attributes in case of Microsoft Clustering Algoritm (but not in case of DT Algorithm).

    I also doublechecked it. I scripted the model created through AS project (Basic DM Tutorial) and through script.
    The former script contains "predict" (<Usage>Predictonly</Usage> ) and the latter script of model creatin does not have any "predict" strings.

    Making all the procedures through AS project in BIDS does no distinction between Decision Trees and MS Clustering Algorithm model creation.
    While the creatiion of model through DMX script does make the difference.

    Here are few questions... that make a lot of combinations...
    Who is wrong - the DMX script compiler permitting to mit predictable attributes at all or AS project wizard?
    Am I confusing something?



    Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин
  • jueves, 02 de julio de 2009 8:22Allan MitchellMVPMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     
    Personally speaking, I kind of like the way the wizard does it.  A wizard is a lot easier for people to see visually what is going on and that will make it easier for people to understand which in the case of DM is no bad thing (let's get rid of this mentality that DM is for the few).  

    The fact that using RAW DMX allows you to shortcut this process i think is no bad thing either.  if you are happier with the DMX "Shortcut" then that works for me as well.

    In short then I do not think either option is wrong it is just the wizard if it allowed for "Shortcut" then it may be off-putting for some.


    HTH

    allan

      


  • jueves, 02 de julio de 2009 9:03Guennadiy Vanine Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     
    Personally speaking

    Speaking from the name of my client, the latter insists on pure scripting solutions...

    But my question is not on that. I am interested what is going under the hood.
    Should the noted diferrences underline some kind of error in part of scripting in tutorial?
    or this underlines some profound difference in configuration of the modeling based on MS Clustering Algorithm?
    i.e. possibility to make predictions on any attributes without marking as Predictable any of them?

    Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин
  • jueves, 02 de julio de 2009 9:07Allan MitchellMVPMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     

    Clustering does not require you make anything predictable.  You can, but it is not required.  DTs require that you have something that is predictable and there is the difference.



  • jueves, 02 de julio de 2009 9:12Allan MitchellMVPMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     Respondida
    And this page helps

    http://technet.microsoft.com/en-us/library/ms132066.aspx


    "If the model does not require a predictable column, for example, models that are built by using the Microsoft Clustering and Microsoft Sequence Clustering algorithms, you do not have to include a column definition in the statement. All the attributes in the resulting model will be treated as inputs."