SQL Server Developer Center > SQL Server Forums > Data Mining > Association algorithm - Importance of a rule

Answered Association algorithm - Importance of a rule

  • Monday, March 06, 2006 9:40 AM
     
     

    Can anyone tell me, how the Business Íntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.

    Thanks!

Answers

  • Thursday, March 09, 2006 7:58 PM
     
     Answered

    For rules, the importance is calculated using the following formula:

     

    Importance (A=>B) = log ( p(a|b) / p(b|not a) )

     

    An importance of 0 means there is no association between A and B. A positive

    importance score means that the probability of B goes up when A is true. A

    negative importance score means that the probability of B goes down when A

    is true.

     

    Below is an example of the correlation counts of donut and muffin derived

    from a purchase database. Each cell value represents the number of

    transactions. For example, 15 out of 100 transactions include a customer

    purchasing both donuts and muffins.

     

                      Donut        Not Donut      Total

    Muffin        15              5                    20

    Not muffin  75              5                    80

    Total          90              10                  100

     

    The support, probability, and importance of related itemsets and rules for

    donut and muffin:

     

    Support({Donut}) = 90

    Support({Muffin}) = 20

    Support ({Donut, Muffin}) = 15

    Probability({Donut}) = 90/100 = 0.9

    Probability({Muffin}) = 20/100 = 0.2

    Probability({Donut, Muffin}) = 15/100 = 0.15

     

    Probability(Donut|Muffin) = 15/20 = 0.75

    Probability(Muffin|Donut) = 15/90 = 0.167

     

    Importance({Donut, Muffin}) = 0.15/(0.2*0.9) = 0.833

    Importance (Donut=>Muffin) = ln(Probability(Donut|Muffin)

    /Probability(Donult|Not Muffin))= ln(0.8) = -0.223

    Importance(Muffin=>Donut) = ln(Probability(Muffin|Donut)

    /Probability(Muffin| Not Donut)) = ln(0.33) = -1.100

     

    From the importance of the itemset {Donut, Muffin}, we can see Donut and

    Muffin are negatively correlated; it is rather unlikely for someone who buys

    a Muffin to also buy a Donut.

     

    The Importance score is also known as Weight of Evidence (WOE).

     

     

    **** Fixed importance formule - denominator was reversed

  • Monday, March 13, 2006 12:42 PM
     
     Answered

    Hi,thanks a lot for your answer!

    I recalculated the importance with your formulas and compared this with the results of the microsoft association algorithm.

    Your formula for the importance is almost right, but it calculates the importance for

    Muffin =>Donut and not Donut => Muffin

    and it must be "log" and not "ln" !!

    So at the end, this must be the right formula:

    Importance(Muffin =>Donut) = log(Probability(Donut|Muffin) / Probability(Donut|Not Muffin) )

    and for

    Importance(Donut=> Muffin) = log(Probability(Muffin|Donut) / Probability(Muffin|Not Donut) )

    UllaH

All Replies

  • Thursday, March 09, 2006 7:58 PM
     
     Answered

    For rules, the importance is calculated using the following formula:

     

    Importance (A=>B) = log ( p(a|b) / p(b|not a) )

     

    An importance of 0 means there is no association between A and B. A positive

    importance score means that the probability of B goes up when A is true. A

    negative importance score means that the probability of B goes down when A

    is true.

     

    Below is an example of the correlation counts of donut and muffin derived

    from a purchase database. Each cell value represents the number of

    transactions. For example, 15 out of 100 transactions include a customer

    purchasing both donuts and muffins.

     

                      Donut        Not Donut      Total

    Muffin        15              5                    20

    Not muffin  75              5                    80

    Total          90              10                  100

     

    The support, probability, and importance of related itemsets and rules for

    donut and muffin:

     

    Support({Donut}) = 90

    Support({Muffin}) = 20

    Support ({Donut, Muffin}) = 15

    Probability({Donut}) = 90/100 = 0.9

    Probability({Muffin}) = 20/100 = 0.2

    Probability({Donut, Muffin}) = 15/100 = 0.15

     

    Probability(Donut|Muffin) = 15/20 = 0.75

    Probability(Muffin|Donut) = 15/90 = 0.167

     

    Importance({Donut, Muffin}) = 0.15/(0.2*0.9) = 0.833

    Importance (Donut=>Muffin) = ln(Probability(Donut|Muffin)

    /Probability(Donult|Not Muffin))= ln(0.8) = -0.223

    Importance(Muffin=>Donut) = ln(Probability(Muffin|Donut)

    /Probability(Muffin| Not Donut)) = ln(0.33) = -1.100

     

    From the importance of the itemset {Donut, Muffin}, we can see Donut and

    Muffin are negatively correlated; it is rather unlikely for someone who buys

    a Muffin to also buy a Donut.

     

    The Importance score is also known as Weight of Evidence (WOE).

     

     

    **** Fixed importance formule - denominator was reversed

  • Monday, March 13, 2006 12:42 PM
     
     Answered

    Hi,thanks a lot for your answer!

    I recalculated the importance with your formulas and compared this with the results of the microsoft association algorithm.

    Your formula for the importance is almost right, but it calculates the importance for

    Muffin =>Donut and not Donut => Muffin

    and it must be "log" and not "ln" !!

    So at the end, this must be the right formula:

    Importance(Muffin =>Donut) = log(Probability(Donut|Muffin) / Probability(Donut|Not Muffin) )

    and for

    Importance(Donut=> Muffin) = log(Probability(Muffin|Donut) / Probability(Muffin|Not Donut) )

    UllaH

  • Friday, April 21, 2006 12:32 PM
     
     

    Acutually at the beginning of Jamie's answer, the formular has been already there correctly:

    Importance (A=>B) = log ( p(a|b) / p(a|not b) )

    Regards,

  • Wednesday, March 07, 2007 4:40 PM
     
     

    Importance (A=>B) = log ( p(a|b) / p(a|not b) )

    It makes more sense to me if a and b are switched in the log function

     

    Can some one point me to a Microsoft Research Paper "With all due respect to all" not just odiscussion Onions that discusses the theoretical background for calculating Rule importance?

     

    Musa

  • Thursday, May 24, 2007 3:13 AM
     
     
    Dear all,

    I try to run the "donuts and muffins" example by using SQL 2005 BI but I didnot have the results as the formula you instruct (
    Importance (A=>B) = log ( p(a|b) / p(a|not b) ) ). Please explain me more detail.

    probability importance
    0.938 0.105302438 F3 = NotMuffin -> F2 = Donut









    0.833 0.218055761 F2 = Donut -> F3 = NotMuffin



    0.75 -0.105302438 F3 = Muffin -> F2 = Donut



    0.5 -0.218055761 F2 = NotDonut -> F3 = NotMuffin



    0.5 0.458637849 F2 = NotDonut -> F3 = Muffin



    Thank you very much.
    Your truthly,
  • Tuesday, February 12, 2008 10:08 PM
     
     

    I understand Mr. MacLennan's explanation and appreciate the time he took to explain how importance works.  However, like the user with username "sang", I also ran the data in BI 2005 and got the same results listed by the aforementioned user.  I did this using the following data:

     

    donut muffin
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    y y
    n y
    n y
    n y
    n y
    n y

    etc.

     

    The rule muffin -> donut has an importance of -0.105302438, which is not the same as Mr. MacLennan's results.  I tried switching the roles of  a and b in a -> b and using different bases on the logarithms.  I don't get the result of -0.105302438 with any of these.

  • Friday, February 15, 2008 6:59 AM
    Moderator
     
     

    Musa1, sang, cheryl8150 -- I hope my answer here http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2844838&SiteID=1 clarifies the number mismatch