Decision Trees - significance of Node_Description features

Question

• Let's say that I have one node in my decision tree with a Node_Description that looks like that:

InputParameterA >= 10 and InputParameterB not = 'True' and InputParameterC < 10 and InputParameterD >= 120 and < 150

Is there a way to know what is the importance of each of the four input parameters?

For example, a potential answer might be "InputParameterA" contributed 32%, "InputParameterB" contributed 2%, "InputParameterC" contributed 10%, and "InputParameterD" 56%. Such an answer will clarify that "InputParameterA" and "InputParameterD" are more important.

Thanks!

• Moved by Thursday, September 23, 2010 11:46 PM this is a data mining question (From:SQL Server Analysis Services)
Thursday, September 23, 2010 9:53 PM

• The node description is listed in the conditional order, which means that it is listed with the most important factor first and the least important factor last.

The only exception (IIRC) is when two adjacent nodes that split on the same attribute can be condensed into one item (e.g. 25 < A < 25).  However, even in that case you know that the attribute is more important than the subsequent attributes.

Another thing to try would be to look at the node probabilities of each node tracing to the root of the tree.  I haven't tried it myself, but you could normalize the node probabilities of the subset of nodes in the path and get an idea of the importance of each node.  The order will (by definition) be the same as the conditions in the node_description though.

HTH

-Jamie

______________________________
Jamie MacLennan
CTO
Predixion Software, Inc.
My Blog
Email Me
• Marked as answer by Sunday, September 26, 2010 4:12 AM
Friday, September 24, 2010 6:35 AM

All replies

• The node description is listed in the conditional order, which means that it is listed with the most important factor first and the least important factor last.

The only exception (IIRC) is when two adjacent nodes that split on the same attribute can be condensed into one item (e.g. 25 < A < 25).  However, even in that case you know that the attribute is more important than the subsequent attributes.

Another thing to try would be to look at the node probabilities of each node tracing to the root of the tree.  I haven't tried it myself, but you could normalize the node probabilities of the subset of nodes in the path and get an idea of the importance of each node.  The order will (by definition) be the same as the conditions in the node_description though.

HTH

-Jamie

______________________________
Jamie MacLennan
CTO
Predixion Software, Inc.
My Blog