none
Decision Trees - significance of Node_Description features

    Question

  • Let's say that I have one node in my decision tree with a Node_Description that looks like that:

    InputParameterA >= 10 and InputParameterB not = 'True' and InputParameterC < 10 and InputParameterD >= 120 and < 150

    Is there a way to know what is the importance of each of the four input parameters?

    For example, a potential answer might be "InputParameterA" contributed 32%, "InputParameterB" contributed 2%, "InputParameterC" contributed 10%, and "InputParameterD" 56%. Such an answer will clarify that "InputParameterA" and "InputParameterD" are more important.

    Thanks!


    • Moved by Darren GosbellMVP Thursday, September 23, 2010 11:46 PM this is a data mining question (From:SQL Server Analysis Services)
    Thursday, September 23, 2010 9:53 PM

Answers

  • The node description is listed in the conditional order, which means that it is listed with the most important factor first and the least important factor last.

    The only exception (IIRC) is when two adjacent nodes that split on the same attribute can be condensed into one item (e.g. 25 < A < 25).  However, even in that case you know that the attribute is more important than the subsequent attributes.

    Another thing to try would be to look at the node probabilities of each node tracing to the root of the tree.  I haven't tried it myself, but you could normalize the node probabilities of the subset of nodes in the path and get an idea of the importance of each node.  The order will (by definition) be the same as the conditions in the node_description though.

    HTH

    -Jamie


    ______________________________
    Jamie MacLennan
    CTO
    Predixion Software, Inc.
    My Blog
    Follow on Twitter
    Email Me
    • Marked as answer by CoffeeCake Sunday, September 26, 2010 4:12 AM
    Friday, September 24, 2010 6:35 AM

All replies

  • The node description is listed in the conditional order, which means that it is listed with the most important factor first and the least important factor last.

    The only exception (IIRC) is when two adjacent nodes that split on the same attribute can be condensed into one item (e.g. 25 < A < 25).  However, even in that case you know that the attribute is more important than the subsequent attributes.

    Another thing to try would be to look at the node probabilities of each node tracing to the root of the tree.  I haven't tried it myself, but you could normalize the node probabilities of the subset of nodes in the path and get an idea of the importance of each node.  The order will (by definition) be the same as the conditions in the node_description though.

    HTH

    -Jamie


    ______________________________
    Jamie MacLennan
    CTO
    Predixion Software, Inc.
    My Blog
    Follow on Twitter
    Email Me
    • Marked as answer by CoffeeCake Sunday, September 26, 2010 4:12 AM
    Friday, September 24, 2010 6:35 AM
  • Thanks. :-)
    Sunday, September 26, 2010 4:13 AM