3
votes

I tried computing for my model but I keep getting:

Error: data and reference should be factors with the same levels.

Below is my model:

model3 <- glm(winner ~ srs.1 + srs.2, data = train_set, family = binomial)
confusionMatrix(table(predict(model3, newdata=test_set, type="response")) >= 0.5,
                      train_set$winner == 1)

winner variable contains team1 and team2.
srs.1 and srs.2 are numerical values.

What is my problem here?

1
Check that just the predict() function is working as expected. If not, you need to ensure that the factor levels in test_set for srs.1 and srs.2 are the same (or are a subset of) the factor levels in train_set for the same variables. As an example, if your testing data has variable gender with factor levels "male" and "female", you can't have a factor level of "other" in the testing data. - DanY

1 Answers

2
votes

I suppose your winner label is a binary of 0,1. So let's use the example below:

library(caret)
set.seed(111)
data = data.frame(
srs.1 = rnorm(200),
srs.2 = rnorm(200)
)

data$winner = ifelse(data$srs.1*data$srs.2 > 0,1,0)

idx = sample(nrow(data),150)
train_set = data[idx,]
test_set = data[-idx,]

model3 <- glm(winner ~ srs.1 + srs.2, data = train_set, family = binomial)

Like you did, we try to predict, if > 0.5, it will be 1 else 0. You got the table() about right. Note you need to do it both for test_set, or train_set:

pred = as.numeric(predict(model3, newdata=test_set, type="response")>0.5)
ref = test_set$winner

confusionMatrix(table(pred,ref))

Confusion Matrix and Statistics

    ref
pred  0  1
   0 12  5
   1 19 14

               Accuracy : 0.52            
                 95% CI : (0.3742, 0.6634)
    No Information Rate : 0.62            
    P-Value [Acc > NIR] : 0.943973        

                  Kappa : 0.1085