1
votes

In my R logistic regression in R, I am trying to create a contingency table comparing fitted to observed values (i.e. 0 or 1 actual vs. 0 or 1 fitted value). However, my data has missing values in various rows of various variables, hence the fitted value vector is of a shorter length than the original data set. Here is an example:

test <- data.frame(male=c(1,0,1,0,0,1,1,0,1,0,0,1), 
                 height=c(58,100,NA,19,20,69,58,24,46,19,97,69))

model <- glm(male~height, family=binomial("logit"),data=test)

check_model <- table(test$male,fitted.values(model)>0.5)

Error in table(test$male, fitted.values(model) > 0.5) : all arguments must have the same length

Does anyone know of a way to feed in the actual values (test$male) only in rows where the model has a fitted.value that is not NULL?

2
Did you realize that your code implies that you think there is a function named fitted.values? Had you simply typed ?fitted at the console (or perhaps str(model), you would have made more rapid progress. - IRTFM

2 Answers

2
votes

If you look at ?glm you will see that it returns the model.frame (by default) as a component of the glm object

This contains the data used to fit the model

Thus you can use

table(model.frame(model)$male, fitted(model) > 0.5)

or

table(model$model$male, fitted(model) > 0.5)

To return your required results

##      FALSE TRUE
##   0     4    2
##   1     3    2
1
votes
> table(test$male[complete.cases(test)], fitted(model)>0.5)

    FALSE TRUE
  0     4    2
  1     3    2