Effect of modelling algorithm on Dependency Networks RRS feed

  • Question

  • I am trying to understand the logic behind dependency network links, I have a Naive Bayes model created with 4 input variables, 3 of which are non-numeric, the Bayes dependency network shows no correlation between the PredictOnly column with other Columns.  Even then, the Lift Chart showed the Bayes model has 96% Predict Probability!

    But the depedency network created using Decision Tree algorithm showed strong relationships between the PredictOnly and 2 other input variables.  The Decision Tree model has 91% Predict Probbility.

    My question is:

    1. How does Bayes model create dependency links for non-numerical inputs?
    2. How much can i rely on the results of the Bayes Model when its dependency network does not show any relationships between Predictable Column with others?
    3. How come the decision tree  showed strong links while Bayes didnt , do they use different metrics to calculate the correlation?  i know that the network varies from model to model but i'ld like to know how exactly is it done for each model.

    • Edited by velivela Friday, August 28, 2009 9:46 PM
    Friday, August 28, 2009 9:29 PM


  • Are you testing on the input data or on a holdout set?

    The dependency networks are implemented differently by various algorithms. Naive Bayes uses strictly the conditional probabilities between inputs and outputs. If it show now correlation from other columns to the Predict Only column then, the conditional probabilities are probably very low. The algorithm may still provide apparently good accuracy in certain cases, for example when the training data is very skewed towards one state of the target variable.

    Decision trees uses the occurence of other columns in the tree (and the node probabilities) to build the dependency network. Simply put, if the first split is based on column A and a second split on column B, then A and B will appear in the dependency network, with the strength of B->Target smaller than the strength of A->Target. Again, if the training data is very skewed, it is possible for decision trees to show worse accuracy than Naive Bayes, particularly if the training data is the same as the test data.

    A simple example of how the dependency network may be empty for Naive Bayes and have two dependencies for trees -- imagine a chess board and 64 inputs, one for each square, each of them with two attributes (Column: ODD or EVEN and Row: ODD or EVEN). If you train a naive bayes model to predit the square color (black or white) it will likely show nothing in the dependency network (or show similar weights). The decision trees will show both attributes, ColumnOddity and RowOddity as being important, in the split order.

    So, a rule like (if A and B then C) may lead to trees splitting  (and consequently to dependency net relationships) but may not lead to anything significant in Naive Bayes, because NB assumes that all inputs are independent, while the DT algorithm looks for complex patterns spanning multiple inputs.

    I hope I covered at least partly 1) and 3)

    Regarding 2) Naive Bayes can still be very reliable even if the dependency network is empty -- this is often the case in text classification tasks, where the conditional probabilities are too low to be considered for the dep net, but are still used in actual predictions. 

    If you are not doing this already, I suggest the following:
    - balance the distribution of the target column (the PredictOnly one) in the training set
    - use the  holdout feature in SQL Server 2008 to "save" some data for testing
    - compare the accuracy of the models on the holdout data, not on the training data

    bogdan crivat [sql server data mining] /
    Sunday, August 30, 2009 5:41 AM