3
votes

I have below code. Let's assume that optimization stopped after 600 rounds and best round was 450. Which model will be used for prediction - one after 450th round or after 600th?

watchlist <- list(val=dval,train=dtrain)

param <- list(  objective           = "binary:logistic", 
                booster             = "gbtree",
                eval_metric         = "auc",
                eta                 = 0.02,
                max_depth           = 7,
                subsample           = 0.6,
                colsample_bytree    = 0.7
)

clf <- xgb.train(   params              = param, 
                    data                = dtrain, 
                    nrounds             = 2000, 
                    verbose             = 0,
                    early.stop.round    = 150,
                    watchlist           = watchlist,
                    maximize            = TRUE
)

preds <- predict(clf, test)
1

1 Answers

6
votes

After some research I found answer myself. Predict will use model after 600th rounds. If one wants to use model with best result, should use preds <- predict(clf, test, ntreelimit=clf$bestInd)