0
votes

I am trying to create a plot from the outputs of a logistic regression model where multiple plots are combined:

I have ran a logistic regression model on data which looks like this:

   gender english art science sports geography   insured 
1  Female       0   1       0      0         0         1
2  Female       1   1       0      1         1         1
3  Female       1   0       0      1         1         1
4  Female       1   0       0      0         1         1
5  Female       1   1       1      0         1         1
6  Female       1   1       1      0         0         0
7    Male       1   1       1      1         0         1
8    Male       1   1       1      1         0         0
9  Female       1   1       0      0         0         1
10   Male       1   1       0      0         1         0
11 Female       1   1       0      0         1         1

I have ran a logistic regression model on the data and created a plot of the output using the effects package: this is the code I used for that:

df_fit<- glm( insured ~ english +art+science + gender, data = df, family = 'binomial')

plot(Effect(focal.predictors = c("art",'gender'), df_fit), rug = FALSE)

This is what the plot looks like.effect plot

How can I adjust my code so that all the predicted glm outputs for the '1' valued variables english:science will appear on the left side, whilst all the predicted glm outputs for the '0' values of the variables english:science will be plotted on the right, and separated by gender?

I have tried using gather in dplyr to create a variable which combines all the english:science to create a long dataset but this causes errors in the regression model and disrupts the data.

Is there another way to plot this?

This is my desired output: enter image description here

1
I'm not sure I understand the format you are looking for. You can't really show separate panels for the various effect sizes for males versus females, because the effect sizes are the same for both genders (there is no interaction term), although they do have different baseline values. I also can't understand how you would split into insured = 1 and insured = 0, since it is the (log) odds of insure = 1 vs insure = 0 that you are displaying on the y axis. coefficients of insure = 0 are just the inverse of insure = 1.Allan Cameron

1 Answers

0
votes

You can do something like:

#Create a prediction data frame with each effect separated out
new_data <- data.frame(gender=rep(c("Female","Male"), each=5), english=c(1,0,0,0,0), art=c(1,0,0,0,0), 
                                  science=c(1,0,0,0,0), sport=c(1,0,0,0,0), geography=c(1,0,0,0,0),
                                  subject=c("english", "art", "science", "sports", "geography"))

#Predictions for the new data
fits <- predict(df_fit, newdata=new_data, type="response", se.fit=TRUE)[1:2]
new_data <- cbind(new_data, val=fits[[1]], se=fits[[2]])

Using the power of ggplot to get the figure you're after:

#plot
library(ggplot2)
ggplot(new_data, aes(x=subject, y=val, ymin=val-se, ymax=val+se)) + geom_point() +geom_errorbar() + facet_wrap(~gender) +ylab("Partial effects (+/- 1 se)")

Output plot