Membru care solicită informaţii
I'm going through the sample excel provided with Data Mining addin for Excel.
I created a new data structure from "Source Data". I added new data model, Decision trees (does not really matter which, neither works) to predict BikeBuyer. I click Classification Matrix against the internal test data, and the results show that it does not predict "Yes" for *ANY* row.
What am I doing wrong?
- Editat de algkep 29 martie 2011 15:37 *clarified*
A notice: if I create a "neural network" model on "training data" sheet, and run classification it shows that it predicted "Yes" for *ZERO* rows.
Now, if I delete all but first 1000 rows of data on that sheet, and repeat the procedure, it starts predicting... why is this happening?
You aren't doing anything wrong - if you look at the decision tree model, for example, you will see that none of the leaf nodes will have a probability of yes greater than 50%. The algorithms predict only the most probable state, and it is turning out that "Yes" is never greater than 50% for any row. A more correct approach would be to use a profit chart to find a probability threshold for considering "Yes" and then creating a query that looks like this (assuming a 30% threshold)
SELECT (PredictProbability([BikeBuyer],'Yes') > 0.3) AS IsBikeBuyer FROM MyModel PREDICTION JOIN ....
This type of query will return TRUE/FALSE based on the probability threshold that makes sense for your data.
Predixion Software, Inc.
Follow on Twitter
- Propus ca răspuns de koles 9 mai 2012 07:58
ssas mining structure implement a well known algorithm developed and tested since the 80's by A I community. If the model do not predict anything , i think it's about data or algorithm parameter . for example : If the tree is huge with many nods it's possible that you have done an over-fitting so the model transcript the data so there is generalization capacity
I've played around, and no matter which columns I use for training the models, the highest probability I ever get is something like 25%. Furthermore, when doing classification matrix, it's not possible to specify threshold, so it always, for all the test entries predicts "no".
Is it because I'm doing bad models, or what else could it be..
If you want to predict on bike buyer, try changing the parameters. Else, try predict e.g. home owner. Watch this small video (no audio) for step by step instructions: http://youtu.be/36L0Cat5CEs
Dr. Nico Jacobs, SQL Server BI trainer @ U2U.net
- Editat de SQLWaldorf 2 mai 2012 07:43