So, im using the superconductivity dataset found here... It contains 82 variables and I am subsetting the data to 2000 rows. But when I use xgboost
with mlr3
it does not calculate the importance for all the variables!?
Here's how I'm setting everything up:
# Read in data
mydata <- read.csv("/Users/.../train.csv", sep = ",")
data <- mydata[1:2000,]
# set up xgboost using mlr3
myTaskXG = TaskRegr$new(id = "data", backend = data, target = "critical_temp")
myLrnXG = lrn("regr.xgboost")
myModXG <- myLrnXG$train(myTaskXG)
# Take a look at the importance
myLrnXG$importance()
this outputs something like this:
wtd_mean_FusionHeat std_ThermalConductivity entropy_Density
0.685125173 0.105919410 0.078925149
wtd_gmean_FusionHeat wtd_range_atomic_radius entropy_FusionHeat
0.038797205 0.038461823 0.020889094
wtd_mean_Density wtd_std_FusionHeat gmean_ThermalConductivity
0.017211730 0.006662321 0.005598844
wtd_entropy_ElectronAffinity wtd_entropy_Density
0.001292733 0.001116518
As you can see, there are only 11 variables there... when there should be 81.... if I do a similar process using ranger
, everything works perfectly.
Any suggestions as to what is happening?