Currently, I'm learning about logistic regression and LDA (Linear Discriminant Analysis) classification. I'm trying to generate the data differently to learn logistic regression and LDA behavior.
Here is the data visualization of 2-dimensional predictors with class plotted as color:
Here is my code:
library(ggplot2)
library(MASS)
set.seed(1)
a <- mvrnorm(n = 1000, mu = c(0,0), Sigma = matrix(c(0.4,0,0,0.4), nrow = 2, ncol = 2))
b <- mvrnorm(n = 1000, mu = c(0,0), Sigma = matrix(c(10,0,0,10), nrow = 2, ncol =2 ))
#I want to make sure b1 separated from a
b1 <- b[sqrt(b[,1]^2 + b[,2]^2) > 4,]
df <- as.data.frame(rbind(a,b1))
names(df) <- c('x','y')
labelA <- rep('A', nrow(a))
labelB <- rep('B', nrow(b1))
#Put the label column to the data frame
df$labs <- c(labelA,labelB)
ggplot(df, aes(x = x, y = y, col = labs)) + geom_point()
prd <- glm(as.factor(labs) ~ x + y, family = binomial('probit'), data = df)
prd_score <- predict(prd, type = 'response')
plot(roc(df$labs,prd_score))
auc(roc(df$labs,prd_score))
And this is the roc curve plot
It's really frustrating because I couldn't find any mistake in my code that generates this kind of problem. Can anyone help me to point out any mistake in my code that generates this weird kind of ROC or any explanation on why the ROC could become weird like that?
NB: Please assume that the generated data set above is the training data and I want to predict the training data again.