GBM model generating NA results

Question

I'm trying to run a simple GBM classification model to benchmark performance against random forests and SVMs, but I'm having trouble getting the model to score correctly. It's not throwing an error, but the predictions are all NaN. I'm using the breast cancer data from mlbench. Here's the code:

library(gbm)
library(mlbench)
library(caret)
library(plyr)
library(ada)
library(randomForest)

data(BreastCancer)
bc <- BreastCancer
rm(BreastCancer)

bc$Id <- NULL
bc$Class <- as.factor(mapvalues(bc$Class, c("benign", "malignant"), c("0","1")))

index <- createDataPartition(bc$Class, p = 0.7, list = FALSE)
bc.train <- bc[index, ]
bc.test <- bc[-index, ]

model.gbm <- gbm(Class ~ ., data = bc.train, n.trees = 500)

pred.gbm <- predict(model.gbm, bc.test.ind, n.trees = 500, type = "response")

Can anyone help out with what I'm doing wrong? Also, am I going to have to transform the output of the predict function? I've read that that seems to be an issue with GBM predictions. Thanks.

This is a "feature" of the gbm package. See here for an explanation. (basically, gbm assumes that factor responses follow the multinomial distribution. If there are only 2 unique response values (whether character or numeric), then it assumes bernoulli. — filups21

tsurudak tsurudak · Accepted Answer · 2014-11-12T15:49:41

I have experienced problems with giving a factor variable to gbm before. You can force the Class variable to be a character type instead of factor and that should do it.

bc$Class <- as.factor(mapvalues(bc$Class, c("benign", "malignant"), c("0","1")))
bc$Class <- as.character(bc$Class)

Your code should run fine from there, just make sure you call bc.test (not bc.test.ind) in predict.

Here's a summary of the predicted values I got after making those changes

> summary(pred.gbm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.222   0.222   0.231   0.346   0.573   0.579

One last thing, I would recommend setting a seed (e.g. using set.seed()) before calling createDataPartition(). Otherwise you will get different training and test sets every time you run your code.

GBM model generating NA results

2 Answers