I'm trying to run a simple GBM classification model to benchmark performance against random forests and SVMs, but I'm having trouble getting the model to score correctly. It's not throwing an error, but the predictions are all NaN. I'm using the breast cancer data from mlbench
. Here's the code:
library(gbm)
library(mlbench)
library(caret)
library(plyr)
library(ada)
library(randomForest)
data(BreastCancer)
bc <- BreastCancer
rm(BreastCancer)
bc$Id <- NULL
bc$Class <- as.factor(mapvalues(bc$Class, c("benign", "malignant"), c("0","1")))
index <- createDataPartition(bc$Class, p = 0.7, list = FALSE)
bc.train <- bc[index, ]
bc.test <- bc[-index, ]
model.gbm <- gbm(Class ~ ., data = bc.train, n.trees = 500)
pred.gbm <- predict(model.gbm, bc.test.ind, n.trees = 500, type = "response")
Can anyone help out with what I'm doing wrong? Also, am I going to have to transform the output of the predict function? I've read that that seems to be an issue with GBM predictions. Thanks.
gbm
package. See here for an explanation. (basically,gbm
assumes that factor responses follow the multinomial distribution. If there are only 2 unique response values (whether character or numeric), then it assumes bernoulli. – filups21