0
votes

Below is a paramter for DecisionTreeClassifier: max_depth

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

max_depth : int or None, optional (default=None)

    The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

I always thought that depth of the decision tree should be equal or less than number of the features (attributes) of a given dataset. IWhat if we find pure classes before the mentioned input for that parameter? Does it stop splitting or splits further till the mentioned input?

Is it possible to use the same attribute in two different level of a decision tree while splitting?

1
Tree depth is used merely as a stopping criteria for a given number (which is less than log(n)). If you reach a leaf (with only 1 observation) you will stop building from this point onward.user2974951

1 Answers

3
votes

If the number of features are very high for a decision tree then it can grow very very large. To answer your question, yes, it will stop if it finds the pure class variable. This is another reason DecisionTrees tend to do overfitting.

You would like to use max_depth parameter when you are using Random Forest , which does not select all features for any specific tree, therefore all trees are not expected to grow to the maximum possible depth, which in turn will require pruning. Decision Trees are weak learners and in RandomForest along with max_depth these participate in voting. More details about these RF and DT relations can be search easily on internet. There are a range of articles published.

So, Generally you would like to use max_depth when you are having large number of features. Also, in actual implementations you would like to use RandomForest rather than DecisionTree alone.