column names of DM model (BOL2008)
I follow BOL2008
(SQL Server 2008 Books Online (April 2009))
Tutorials...
Data Mining Tutorials and How-to Topics > Tutorials: Using DMX > Bike Buyer DMX Tutorial
1)
Why do we have to explicitly determine columns for [Decision Tree] Model creation
1a)
ALTER MINING STRUCTURE [Bike Buyer]
ADD MINING MODEL [Decision Tree]
(
CustomerKey,
[Age],
[Bike Buyer] PREDICT,
[Commute Distance],
[Education],
[Gender],
[House Owner Flag],
[Marital Status],
[Number Cars Owned],
[Number Children At Home],
[Occupation],
[Region],
[Total Children],
[Yearly Income]
) USING Microsoft_Decision_Trees
WITH DRILLTHROUGH
and NOT for [Clustering] model
1b)
ALTER MINING STRUCTURE [Bike Buyer]
ADD MINING MODEL [Clustering]
USING Microsoft_Clustering
?
1c)
Is it possible to omit column names in 1a) too?
Anyway it is impossible to use the different column names from those indicated during cration of mining structure.
2)
How can one change the names to differing from those (created in mining structure)?
by script?
Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин- 編集済みGuennadiy Vanine 2009年7月1日 11:02
回答
- And this page helpshttp://technet.microsoft.com/en-us/library/ms132066.aspx"If the model does not require a predictable column, for example, models that are built by using the Microsoft Clustering and Microsoft Sequence Clustering algorithms, you do not have to include a column definition in the statement. All the attributes in the resulting model will be treated as inputs."
- 回答としてマークGuennadiy Vanine 2009年7月3日 7:51
すべての返信
- Hi 1b)So I guess you didn't try to create a model from a structure without specifying the column names. Here is what happens when I did
ALTER MINING STRUCTURE [Targeted Mailing] ADD MINING MODEL ZZY USING Microsoft_Decision_Trees
2.Executing the query ... Error (Data mining): Error validating attributes for the 'ZZY' mining model. Error (Data mining): The algorithm requires at least one predictable attribute. None found in mining model, ZZY. Execution complete
Not sure this is possible in the UI (Couldn't see it anyway) but yes your model column names can be differentALTER MINING STRUCTURE [Targeted Mailing] ADD MINING MODEL ZZY2 ([Customer Key], [Gender] as [____], Region as [Where I live]) USING Microsoft_Clustering
- Gender has a weird alias there because I think the UI to the forum may have taken exception to my naming. You can guess what it was.
I guess I underspecified the question (though it seems clear what I implicitly asked).So I guess you didn't try to create a model from a structure without specifying the column names. Here is what happens when I didALTER MINING STRUCTURE [Targeted Mailing] ADD MINING MODEL ZZY USING Microsoft_Decision_Trees
While creating DM model (or sructure with model) in BIDS AS project the wizard does not permit to pass without marking at least one checkbox in Predictable column (this tep and Window have heading " Data Mining Wizard/ Specify the Training Data" ), this is during creation
Meanwhile DMX script permits creation of model without predictable attributes in case of Microsoft Clustering Algoritm (but not in case of DT Algorithm).
I also doublechecked it. I scripted the model created through AS project (Basic DM Tutorial) and through script.
The former script contains "predict" (<Usage>Predictonly</Usage> ) and the latter script of model creatin does not have any "predict" strings.
Making all the procedures through AS project in BIDS does no distinction between Decision Trees and MS Clustering Algorithm model creation.
While the creatiion of model through DMX script does make the difference.
Here are few questions... that make a lot of combinations...
Who is wrong - the DMX script compiler permitting to mit predictable attributes at all or AS project wizard?
Am I confusing something?
Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин- Personally speaking, I kind of like the way the wizard does it. A wizard is a lot easier for people to see visually what is going on and that will make it easier for people to understand which in the case of DM is no bad thing (let's get rid of this mentality that DM is for the few).The fact that using RAW DMX allows you to shortcut this process i think is no bad thing either. if you are happier with the DMX "Shortcut" then that works for me as well.In short then I do not think either option is wrong it is just the wizard if it allowed for "Shortcut" then it may be off-putting for some.HTHallan
Personally speaking
Speaking from the name of my client, the latter insists on pure scripting solutions...
But my question is not on that. I am interested what is going under the hood.
Should the noted diferrences underline some kind of error in part of scripting in tutorial?
or this underlines some profound difference in configuration of the modeling based on MS Clustering Algorithm?
i.e. possibility to make predictions on any attributes without marking as Predictable any of them?
Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин- Clustering does not require you make anything predictable. You can, but it is not required. DTs require that you have something that is predictable and there is the difference.
- And this page helpshttp://technet.microsoft.com/en-us/library/ms132066.aspx"If the model does not require a predictable column, for example, models that are built by using the Microsoft Clustering and Microsoft Sequence Clustering algorithms, you do not have to include a column definition in the statement. All the attributes in the resulting model will be treated as inputs."
- 回答としてマークGuennadiy Vanine 2009年7月3日 7:51

