1
votes

I would like to know how can I draw a ROC plot with R. I have created a logistic regression model with k-fold cross validation.

dt3 - main dataset

dt3Training - training split made from main dataset

dt3Test - test split made from main dataset

Below is the code that used for logistic regression:

ctrl<- trainControl (method="repeatedcv", number = 10, repeats =5, savePredictions="TRUE"

modelfit <- train (Attrition~., data=dt3, method="glm", family="binomial", trControl=ctrl)

pred = predict (modelfit, newdata=dt3Test)

confusionMatrix(data=pred, dt3Test$Attrition)

My problem is that, pred does not show up as a prediction, instead it shows as a data table. Therefore below code gives an error.

perf1 <- performance(pred,"tpr","fpr")

plot(perf1)

I would be really grateful if you can help me with this.

UPDATE: After viewing k-fold cross validation - how to get the prediction automatically? I changed my code to below:

library("caret", lib.loc="~/R/win-library/3.4")
load (df) ## load main dataset "df"
tc <- trainControl("cv",10,savePred=T) ##create folds
(fit<-train(Attrition~.,data=df,method="glm",family="binomial",trControl=tc)) ##train model, predict Attrition with all other variables

I would like to try code below by Claus Wilke however I got confused as I only have my main data (df) and my model (fit).

data.frame(predictor = predict(fit, df),
known.truth = fit$Attrition,
model = "fit") 

or

data.frame(predictor = predict(fit, tc),
known.truth = tc$Attrition,
model = "fit") 

Sorry if I am asking a really stupid question, but I don't have much time left for my project to finish. And I don't have previous experience with R.

2
Please follow the instructions in stackoverflow.com/questions/5963269/… to make a reproducible example. At the very, very least report the error message and packages used, without it it's impossible to help you.Calimo

2 Answers

2
votes

Since you don't provide a reproducible example, I'll use a different dataset and model. For ggplot2, the package plotROC provides generic ROC plotting capabilities that work with any fitted model. You just need to place the known truth and your predicted probabilities (or other numerical predictor variable) into a data frame and then hand to the geom. Example follows.

library(MASS) # for Pima data sets
library(ggplot2)
library(plotROC)

# train model on training data
glm.out.train <- glm(type ~ npreg + glu + bp + bmi + age,
                     data = Pima.tr,
                     family = binomial)

# combine linear predictor and known truth for training and test datasets into one data frame
df <- rbind(data.frame(predictor = predict(glm.out.train, Pima.tr),
                       known.truth = Pima.tr$type,
                       model = "train"),
            data.frame(predictor = predict(glm.out.train, Pima.te),
                       known.truth = Pima.te$type,
                       model = "test"))

# the aesthetic names are not the most intuitive
# `d` (disease) holds the known truth
# `m` (marker) holds the predictor values 
ggplot(df, aes(d = known.truth, m = predictor, color = model)) + 
  geom_roc(n.cuts = 0)

enter image description here

1
votes

I found a way to plot a ROC curve - I will write down the code from very beginning - creating the model then the ROC curve:

Creating logistic regression with k folds:

library("caret", lib.loc="~/R/win-library/3.4")
load (df) 
## load main dataset "df"

tc <- trainControl("cv",10,savePred=T)
##create folds

(fit<-train   (Attrition~.,data=df,method="glm",family="binomial",trControl=tc)) 
##train model, predict Attrition with all other variables

For the ROC Curve:

library(ggplot2)
library(ROCR)

predict0 <- predict(fit, type = 'raw')

ROCRpred0 <- prediction(as.numeric(predict0),as.numeric(df$Attrition))

ROCRperf0<- performance(ROCRpred0, 'tpr', 'fpr')

plot(ROCRperf0, colorize=TRUE, text.adj=c(-0.2,1.7))

I could get a plot with this code, I hope I could help other people with the same problem.Sample ROC Curve - discrete values