none
the mean of using Association with importance and probability

    Question

  • hi,
    i have a exercise using association datamining
    my database have 350 records,
    i use 90 records for datamining and it release some rules which i choose on top of mSOLAP_NODE_SCORE,
    but when i use select statement to check my result i have 1 records, the same as my result, and 5 records not true;
    for example:
    rules A=a,B=b-> C=c
    select * from <my_table> where A='a' and B='b' and C='c'; ==>1 record return
    select * from <my_table> where A='a' and B='b' and C<>'c'; ==>5 records return
    C with 3 values c1,c2,c
    with the second statement C includes 2 c1 and 3 c2

    i don't understand how they work.
    i want to choose some best rules can present my database.
    how can i choose importance and probability to get best rules.
    with database have 90 records and a database have 350 records which values i should use for minimum_probability, Minimum_Support, Minimum_importance...
    when i choose rules i should choose on importance or probability.

    thanks for your help

    Thursday, April 12, 2007 2:16 AM

Answers

  • I'm really having trouble understanding your question and what you want to see as a result.

     

    Minimum_Support is simply how many times an event has to happen to be counted.  For example, if set Minimum_support to 10, then the "itemset" Aa,Bb,Cc would have to happen together at least 10 times before it was counted at all.  If you set Minimum_support to 0.1, then it would have to happen together in 10% of all cases.

     

    Minimum_Probability is the minumum ratio allowed for something to become a "rule".  For example, if Minimum_probability was set to 0.4 (40%) and Aa, Bb appeared 10 times in your data, then Aa, Bb, Cc would have to appear at least 4 times in the data for the rule Aa, Bb -> Cc to be considered a rule.  (Note that your Minimum_Support would also have to allow for the "temset" to be counted at all),

     

    Minimum_Importance is a calculation that further filters rules based on the amount of lift they provide - the purpose is to filter out tautologies, e.g. "Everybody buys milk, so Cookies->Milk is true with 100%".   This rule is not important, since <anything>->Milk would also be 100%.

     

    HTH

    -Jamie

    Tuesday, April 17, 2007 6:39 PM

All replies

  • nobody help me..
    i need an answer. do i?
    Saturday, April 14, 2007 1:30 AM
  • help me
    can anybody help me?
    thanks for your interesting in my question.
    Saturday, April 14, 2007 4:12 PM
  • I'm really having trouble understanding your question and what you want to see as a result.

     

    Minimum_Support is simply how many times an event has to happen to be counted.  For example, if set Minimum_support to 10, then the "itemset" Aa,Bb,Cc would have to happen together at least 10 times before it was counted at all.  If you set Minimum_support to 0.1, then it would have to happen together in 10% of all cases.

     

    Minimum_Probability is the minumum ratio allowed for something to become a "rule".  For example, if Minimum_probability was set to 0.4 (40%) and Aa, Bb appeared 10 times in your data, then Aa, Bb, Cc would have to appear at least 4 times in the data for the rule Aa, Bb -> Cc to be considered a rule.  (Note that your Minimum_Support would also have to allow for the "temset" to be counted at all),

     

    Minimum_Importance is a calculation that further filters rules based on the amount of lift they provide - the purpose is to filter out tautologies, e.g. "Everybody buys milk, so Cookies->Milk is true with 100%".   This rule is not important, since <anything>->Milk would also be 100%.

     

    HTH

    -Jamie

    Tuesday, April 17, 2007 6:39 PM