Association algorithm - Importance of a rule
-
Monday, March 06, 2006 9:40 AM
Can anyone tell me, how the Business Íntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.
Thanks!
Answers
-
Thursday, March 09, 2006 7:58 PM
For rules, the importance is calculated using the following formula:
Importance (A=>B) = log ( p(a|b) / p(b|not a) )
An importance of 0 means there is no association between A and B. A positive
importance score means that the probability of B goes up when A is true. A
negative importance score means that the probability of B goes down when A
is true.
Below is an example of the correlation counts of donut and muffin derived
from a purchase database. Each cell value represents the number of
transactions. For example, 15 out of 100 transactions include a customer
purchasing both donuts and muffins.
Donut Not Donut Total
Muffin 15 5 20
Not muffin 75 5 80
Total 90 10 100
The support, probability, and importance of related itemsets and rules for
donut and muffin:
Support({Donut}) = 90
Support({Muffin}) = 20
Support ({Donut, Muffin}) = 15
Probability({Donut}) = 90/100 = 0.9
Probability({Muffin}) = 20/100 = 0.2
Probability({Donut, Muffin}) = 15/100 = 0.15
Probability(Donut|Muffin) = 15/20 = 0.75
Probability(Muffin|Donut) = 15/90 = 0.167
Importance({Donut, Muffin}) = 0.15/(0.2*0.9) = 0.833
Importance (Donut=>Muffin) = ln(Probability(Donut|Muffin)
/Probability(Donult|Not Muffin))= ln(0.8) = -0.223
Importance(Muffin=>Donut) = ln(Probability(Muffin|Donut)
/Probability(Muffin| Not Donut)) = ln(0.33) = -1.100
From the importance of the itemset {Donut, Muffin}, we can see Donut and
Muffin are negatively correlated; it is rather unlikely for someone who buys
a Muffin to also buy a Donut.
The Importance score is also known as Weight of Evidence (WOE).
**** Fixed importance formule - denominator was reversed
-
Monday, March 13, 2006 12:42 PM
Hi,thanks a lot for your answer!
I recalculated the importance with your formulas and compared this with the results of the microsoft association algorithm.
Your formula for the importance is almost right, but it calculates the importance for
Muffin =>Donut and not Donut => Muffin
and it must be "log" and not "ln" !!
So at the end, this must be the right formula:
Importance(Muffin =>Donut) = log(Probability(Donut|Muffin) / Probability(Donut|Not Muffin) )
and for
Importance(Donut=> Muffin) = log(Probability(Muffin|Donut) / Probability(Muffin|Not Donut) )
UllaH
All Replies
-
Thursday, March 09, 2006 7:58 PM
For rules, the importance is calculated using the following formula:
Importance (A=>B) = log ( p(a|b) / p(b|not a) )
An importance of 0 means there is no association between A and B. A positive
importance score means that the probability of B goes up when A is true. A
negative importance score means that the probability of B goes down when A
is true.
Below is an example of the correlation counts of donut and muffin derived
from a purchase database. Each cell value represents the number of
transactions. For example, 15 out of 100 transactions include a customer
purchasing both donuts and muffins.
Donut Not Donut Total
Muffin 15 5 20
Not muffin 75 5 80
Total 90 10 100
The support, probability, and importance of related itemsets and rules for
donut and muffin:
Support({Donut}) = 90
Support({Muffin}) = 20
Support ({Donut, Muffin}) = 15
Probability({Donut}) = 90/100 = 0.9
Probability({Muffin}) = 20/100 = 0.2
Probability({Donut, Muffin}) = 15/100 = 0.15
Probability(Donut|Muffin) = 15/20 = 0.75
Probability(Muffin|Donut) = 15/90 = 0.167
Importance({Donut, Muffin}) = 0.15/(0.2*0.9) = 0.833
Importance (Donut=>Muffin) = ln(Probability(Donut|Muffin)
/Probability(Donult|Not Muffin))= ln(0.8) = -0.223
Importance(Muffin=>Donut) = ln(Probability(Muffin|Donut)
/Probability(Muffin| Not Donut)) = ln(0.33) = -1.100
From the importance of the itemset {Donut, Muffin}, we can see Donut and
Muffin are negatively correlated; it is rather unlikely for someone who buys
a Muffin to also buy a Donut.
The Importance score is also known as Weight of Evidence (WOE).
**** Fixed importance formule - denominator was reversed
-
Monday, March 13, 2006 12:42 PM
Hi,thanks a lot for your answer!
I recalculated the importance with your formulas and compared this with the results of the microsoft association algorithm.
Your formula for the importance is almost right, but it calculates the importance for
Muffin =>Donut and not Donut => Muffin
and it must be "log" and not "ln" !!
So at the end, this must be the right formula:
Importance(Muffin =>Donut) = log(Probability(Donut|Muffin) / Probability(Donut|Not Muffin) )
and for
Importance(Donut=> Muffin) = log(Probability(Muffin|Donut) / Probability(Muffin|Not Donut) )
UllaH
-
Friday, April 21, 2006 12:32 PM
Acutually at the beginning of Jamie's answer, the formular has been already there correctly:
Importance (A=>B) = log ( p(a|b) / p(a|not b) )
Regards,
-
Wednesday, March 07, 2007 4:40 PM
Importance (A=>B) = log ( p(a|b) / p(a|not b) )
It makes more sense to me if a and b are switched in the log function
Can some one point me to a Microsoft Research Paper "With all due respect to all" not just odiscussion Onions that discusses the theoretical background for calculating Rule importance?
Musa
-
Thursday, May 24, 2007 3:13 AMDear all,
I try to run the "donuts and muffins" example by using SQL 2005 BI but I didnot have the results as the formula you instruct (Importance (A=>B) = log ( p(a|b) / p(a|not b) ) ). Please explain me more detail.
probability importance 0.938 0.105302438 F3 = NotMuffin -> F2 = Donut 0.833 0.218055761 F2 = Donut -> F3 = NotMuffin 0.75 -0.105302438 F3 = Muffin -> F2 = Donut 0.5 -0.218055761 F2 = NotDonut -> F3 = NotMuffin 0.5 0.458637849 F2 = NotDonut -> F3 = Muffin
Thank you very much.
Your truthly, -
Tuesday, February 12, 2008 10:08 PM
I understand Mr. MacLennan's explanation and appreciate the time he took to explain how importance works. However, like the user with username "sang", I also ran the data in BI 2005 and got the same results listed by the aforementioned user. I did this using the following data:
donut muffin y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y n y n y n y n y n y etc.
The rule muffin -> donut has an importance of -0.105302438, which is not the same as Mr. MacLennan's results. I tried switching the roles of a and b in a -> b and using different bases on the logarithms. I don't get the result of -0.105302438 with any of these.
-
Friday, February 15, 2008 6:59 AMModerator
Musa1, sang, cheryl8150 -- I hope my answer here http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2844838&SiteID=1 clarifies the number mismatch

