I am constructing an XGBoost model for a binary choice prediction. However, I am having trouble generating predictions. How do I go from the end of this code to actual predictions on the test data? My code has 7 independent variables, and one dependent variable, which is a binary choice.
choice <- dataset_training$choiceprobX
set.seed(1234)
ind <- sample(2, nrow(dataset_training), replace=TRUE, prob=c(0.67, 0.33))
training <- as.matrix(dataset_training[ind==1, 1:7])
head(training)
testing <- as.matrix(dataset_training[ind==2, 1:7])
head(testing)
dataset_trainLabel <- dataset_training[ind==1, 8]
head(dataset_trainLabel)
dataset_testLabel <- dataset_training[ind==2, 8]
head(dataset_testLabel)
xgb.train <- xgb.DMatrix(data=training,label=dataset_trainLabel)
xgb.test <- xgb.DMatrix(data=testing,label=dataset_testLabel)
params = list(
booster="gbtree",
eta=0.01,
max_depth=5,
gamma=3,
subsample=0.75,
colsample_bytree=1,
objective="binary:logistic",
eval_metric="logloss"
)
xgb.fit=xgb.train(
params=params,
data=xgb.train,
nrounds=10,
nthreads=1,
early_stopping_rounds=10,
watchlist=list(val1=xgb.train,val2=xgb.test),
verbose=0
)
xgb.fit
My goal is to generate a confusion matrix, but when I do it, it tells me that the data and reference must be factors of the same level.