0
votes

Say I have n training samples and a binary classification task. I want to train a decision tree of smallest possible depth and having fewest possible total nodes such that the training accuracy on these n samples is 100%. In the worst case, this would mean that I have one leaf node per sample. Is there some configuration of parameters in Scikit-Learn's implementation [1] of the DecisionTreeClassifier that would let me achieve this?

[1] https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn-tree-decisiontreeclassifier

1
max_depth: The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.ombk
if you dont set the max depth, it will develop the tree to the maxombk
That's not really true, I think. max_depth sets an upper limit on the depth. But if you set (say) max_depth = 1000, it is not always the case that clf.get_depth() == max_depth.madman_with_a_box
which one is smaller :p clf.get_depth() ?ombk
i dont think you understand how trees work. you have an algorithm trying to split your data into baskets of pure leaves, if it reaches a point where everything is split, it stops. therefore, clf.get_depth won't be as big as the max_depth you set, it will stop once it makes the full tree, which could just use 6 depth.ombk

1 Answers

1
votes

Answer

By reading the documentation you get your answer.

If you dont set a limit to max_depth the tree will keep expanding to the deepest leaf.

Also you can check here similar question.