3
votes

When I use XGBoost to fit a model, it usually shows a list of messages like "updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=5". I wonder how XGBoost is performing the tree pruning? I cannot find the description about their pruning process in their paper.

Note: I do understand the decision tree pruning process e.g. pre-pruning and post-pruning. Here I am curious about the actual pruning process of XGBoost. Usually pruning requires a validation data, but XGBoost performs the pruning even when I do not give it any validation data.

1

1 Answers

5
votes

XGBoost grows all trees to max_depth first.

This allows for fast training as you don't have to evaluate all the regularization parameters at each node.

After each tree is grown to max_depth, you walk from the bottom of the tree ( recursively all the way to the top ) and determine whether the split and children are valid based on the hyper-parameters you selected. If the split or nodes are not valid, they are removed from the tree.

In the model dump of an XGBoost model you can observe the actual depth will be less than the max_depth during training if pruning has occurred.

Pruning requires no validation data. It is only asking a simple question as to whether the split, or resulting child nodes are valid, based on the hyper-parameters you have set during training.