My goal is to identify at what depth two samples separate within a decision tree. In the development version of scikit-learn you can use the decision_path()
method to identify to last common node:
from sklearn import tree
import numpy as np
clf = tree.DecisionTreeClassifier()
clf.fit(data, outcomes)
n_nodes = clf.tree_.node_count
node_indicator = clf.decision_path(data).toarray()
sample_ids = [0,1]
common_nodes = (node_indicator[sample_ids].sum(axis=0) == len(sample_ids))
common_node_id = np.arange(n_nodes)[common_nodes]
max_node = np.max(common_node_id)
Is there a way to determine at what depth the max_node
occurs within the tree, possibly with clf.tree_.children_right
and clf.tree_.chrildren_left
?