I have trouble understanding the exact meaning of the feature importance scores in caret for RF regression. As you know there are many potential importance measures for RF. However, there is no clear indication which one is used.
Here is a toy example:
data(iris)
y_train = iris['Sepal.Length']
X_train = iris[2:4]
mdl_rf_inner <- caret::train(X_train, y_train$Sepal.Length, method = "rf",
preProcess = c("center", "scale"),
ntrees = 1000, importance = T)
feat_imp_2 <- caret::varImp(mdl_rf_inner, scale=F)
Resulting in:
rf variable importance
Overall
Petal.Length 48.51
Sepal.Width 23.67
Petal.Width 17.15
Please keep in mind that I am predicting sepal length, so despite using iris data it is a regression problem. I read the docs and there is no clear indication as to which variable importance is being calculated (Gini-impurity decrease?, mse decrease?, permuation importance?, out of bag?, etc., etc.).
To further complicate things, the train function also has the importance = T argument, which doesn't really seem to serve a clear purpose when using varImp(). Is that correct?
I would greatly appreciate your insights on this.
Best wishes!