Ask a questionAsk a question
 

AnswerNaive Bayes Algorithm

  • Tuesday, November 03, 2009 10:59 AMAmit Dixit Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I am facing issue with text classification by using SQL server Naive Bayes algorithm. When i try to find terms which are there in two different classes, it is giving wrong results. It is always giving same probability for each class.

    I am using SQL Server 2008 Analysis services. I have 3 classes - Dog, Cow, Cat. Each has got 2 terms. When i try to find 2 terms in model from two different classes, it gives equal probability of being in all three classes.


    Any pointers will be helpful.
    Amit

Answers

  • Tuesday, November 03, 2009 4:18 PMrok1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
      
        1. Bayesian theory treats all your input attributes independent from each other. Since, NB doesn't support continuous attribute and it couldn't tell which term existed how many times, so its calculating the probability of likelihood for each state from input attributes(terms)  (which in your case is same for all three states)


        2. Although, predict and predicthistogram functions are not supported by SVM, you can get your prediction from prediction join. follow the link and you'll understand how.


    http://www.codeplex.com/svmplugin/Thread/View.aspx?ThreadId=39413




    hth,

    Rok
    • Marked As Answer byJin ChenMSFT, ModeratorWednesday, November 11, 2009 9:37 AM
    • Proposed As Answer byrok1 Tuesday, November 03, 2009 6:41 PM
    •  

All Replies

  • Tuesday, November 03, 2009 3:06 PMrok1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     


    Although naive bayes is one of the most popular algorithm, for text mining, it is only ideal for a general classification because it doesn't account dependencies among inputs.

    My suggestion is to use Neural network or logistic regression. The most popular algorithm for text classification is SVM (support vector machine).

    You can download SVM plug-in for Analysis services  from this link

    http://www.codeplex.com/svmplugin

    The best result you'll get is from SVM and Neural network (well, depending upon the number of input attributes and predictable states) the training time will take some time so try with different parameters and you'll get amazing results!

    Sometimes logistic regression which is similar to Neural network (without the hidden nodes) can get you what you need also.


    hth,



    Rok

    Please mark it as answer if you find this helpful!
  • Tuesday, November 03, 2009 3:17 PMAmit Dixit Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks Rok for detailed reply, I will try SVM and LR

    I also wanted to understand why NB is behaving like this. I have taken very small and simple example with only 2 terms in each class( 3 classes). It will be great help if you can provide me some details about it.
    Amit
  • Tuesday, November 03, 2009 3:31 PMAmit Dixit Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Also Predict, PredictHistogram functions are not working for SVM. Can you please let me know which all DMXfunction wil work while predicting from the model.

    Amit
  • Tuesday, November 03, 2009 4:18 PMrok1 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
      
        1. Bayesian theory treats all your input attributes independent from each other. Since, NB doesn't support continuous attribute and it couldn't tell which term existed how many times, so its calculating the probability of likelihood for each state from input attributes(terms)  (which in your case is same for all three states)


        2. Although, predict and predicthistogram functions are not supported by SVM, you can get your prediction from prediction join. follow the link and you'll understand how.


    http://www.codeplex.com/svmplugin/Thread/View.aspx?ThreadId=39413




    hth,

    Rok
    • Marked As Answer byJin ChenMSFT, ModeratorWednesday, November 11, 2009 9:37 AM
    • Proposed As Answer byrok1 Tuesday, November 03, 2009 6:41 PM
    •