0
votes

In R I am fitting a GLM (logit) on individual-level data with a binominal dependent variable.

However, I would like to plot the fit on an aggregate level (i.e. with the % of successes on the y-axis). What would be the easiest way to both plot the scatter of realized data (aggregate) and the regression line? I already tried ggplot with stat_smooth() but if I make the scatter on aggregate level, the glm will be fitted on this aggregate level as well.

library(datasets)
data(mtcars)
fit <- glm(vs ~ mpg + cyl + mpg:cyl + disp + drat, family=binomial(link='logit'), data=mtcars)

mtcars_agg <- mtcars %>%
  group_by(carb) %>%
  summarise_each(funs(mean))

form <- formula("mtcars$vs ~ mtcars$mpg + mtcars$cyl + mtcars$mpg:mtcars$cyl + mtcars$disp + mtcars$drat")

ggplot(mtcars_agg, aes(x=mpg, y=vs)) + geom_point() + 
  stat_smooth(data=mtcars, method="glm", formula = form, method.args=list(family="binomial"), se=FALSE)

Does anyone know how to deal with this? If I do not specify formula in the stat_smooth call, there are two things that are not how I would like to see it:

  1. It just takes y ~ x as formula. However, I would like to include interaction-variables as well.
  2. I would like to fit on individual-level data, not aggregate-level data.

Without specifying formula, the plot looks like this:

enter image description here

1
First, can you provide any example of the plot you require? Second, why in your ggplot carb is the independent variable while in your glm() carb is not included as a variable?MikolajM
in the ggplot the x should have been mpg, have edited this now. Besides, I have insert the plot that I require, although that plot is just based on y ~ x, while I would like to specify more riskdrivers than just x.Z117
I think in stat_summary() you can specify only one predictor / only one x. Moreover if you want to visualize the equation of your glm, where you have 5 different predictor, then you need to have a 5-dimensional plot, what is impossible. My solution would be to use predict() to predict vs values for different mpg, while other variables are constantantMikolajM

1 Answers

1
votes

As I wrote in the comment I think it is impossible to visualize such glm as you would requite a 5-dimensional plot. However it is possible to visualize probability of vs. against different mpg (or other variable) while other variables are constant.

Here is my example:

library(datasets)
data(mtcars)

fit <- glm(vs ~ mpg + cyl + mpg:cyl + disp + drat, family=binomial(link='logit'), data=mtcars)

to.visualize <- expand.grid(mpg=c(10:35), cyl=4, disp=300, drat=4)
to.visualize$vs <- predict.glm(fit, newdata = to.visualize, type="response")

library(ggplot2)
ggplot(data=to.visualize, aes(x=mpg, y=vs))+ 
  geom_point()+
  geom_path()+
  ggtitle("Probability of vs for different mpg while cyl=4, disp=300, drat=4")

What gives plot that looks like this:

Prob. of vs for different mpg