I'm still confused how the adjusted probability is calculated in detail. Is it calculated by considdering all probabilities along the path from the root to the leaf node? Or, is it only influenced by the ditribution of both, root and leaf node?
Given a very basic tree example with a root (30 positiv training cases / 70 negativ training cases) and 2 leaf nodes. One leaf node with a distribution 7 positiv cases and 3 negativ cases, the other one has 23 positiv and 67 negativ cases. What would the adjusted probability of the leaf nodes look like?
I noticed from my work that the adjusted probabilty is higher, where the distribution of the leaf nodes changed much compared to the distribution of the root node. Can this be interpretated as a higher confidence?
Thanks in advance!
It depends on the algorithm. You will find a good explanation here http://social.msdn.microsoft.com/forums/en-US/sqldatamining/thread/02d60ba1-9b45-44d9-af0a-3957232ad553/
Thank you for your reply. However, I feel sorry to admit: I don't get it.
(I want to talk about Decision Trees)
from the articel you suggested:
"In case of decision trees, adjusted probability is the predicted probability (coming from the tree node distribution) adjusted with the prior probability of the respective tree node. The adjustment factor is the same for all targte states, therefore adjusted probability does not change the prediction result, but it reflects more accurately the confidence of the prediction as given by the whole model."
What does "adjusted" mean? multiplied?
Is the tree node the root node of the tree? (compared to http://msdn.microsoft.com/en-us/library/cc645758.aspx, I refer to the "All"-node
I read some where that the adjusted probability des not sum to 1, is this right?
Still, I appreciate any help!
With kind regards!
OK, it's not particularly clear. A simple way of thinking about it is that adjustedprobability penalises already popular items. For example, let's assume that you are predicting supermarket items and 5% of all customers buy bananas. Then even if there were no associated items in the basket, predict() would return bananas at 5%. Adjustedprobability will return a much lower prediction value if there were no associated items. Whether you use adjustedprobability or not depends on your application. The "real" probability is non adjusted, but adjustedprobability is more appropriate when you want to be exclusively sensitive to the basket.
Hope that helps. By the way, you can see http://RichardLees.com.au/Sites/Demonstrations to see a couple of real-time data mining demonstrations. Both of which use $probability as I want the predict as close as possible to the real value. There is also a data mining tutorial on http://technet.microsoft.com/en-us/library/dd883232.aspx which has some real data. You can write (or use the wizard) DMX queries with adjustedprobability and probability to see the difference.
So far, I got some rough feeling about penalising already popular items. This is the observation I tried to describe in my initial posting with: the adjusted probability is higher for thoose nodes, where the distribution changed much compared to the root node.
But since I was asked to explain the adjusted probability in detail, Im' still looking for a formula or something like an algorithm describing this magic confidence measure.
Nevertheless, tank you very much Richard!