In R I am fitting a GLM (logit) on individual-level data with a binominal dependent variable.
However, I would like to plot the fit on an aggregate level (i.e. with the % of successes on the y-axis). What would be the easiest way to both plot the scatter of realized data (aggregate) and the regression line? I already tried ggplot with stat_smooth() but if I make the scatter on aggregate level, the glm will be fitted on this aggregate level as well.
library(datasets)
data(mtcars)
fit <- glm(vs ~ mpg + cyl + mpg:cyl + disp + drat, family=binomial(link='logit'), data=mtcars)
mtcars_agg <- mtcars %>%
group_by(carb) %>%
summarise_each(funs(mean))
form <- formula("mtcars$vs ~ mtcars$mpg + mtcars$cyl + mtcars$mpg:mtcars$cyl + mtcars$disp + mtcars$drat")
ggplot(mtcars_agg, aes(x=mpg, y=vs)) + geom_point() +
stat_smooth(data=mtcars, method="glm", formula = form, method.args=list(family="binomial"), se=FALSE)
Does anyone know how to deal with this? If I do not specify formula in the stat_smooth call, there are two things that are not how I would like to see it:
- It just takes
y ~ x
as formula. However, I would like to include interaction-variables as well. - I would like to fit on individual-level data, not aggregate-level data.
Without specifying formula, the plot looks like this:
ggplot
carb is the independent variable while in yourglm()
carb is not included as a variable? – MikolajMggplot
the x should have been mpg, have edited this now. Besides, I have insert the plot that I require, although that plot is just based on y ~ x, while I would like to specify more riskdrivers than just x. – Z117stat_summary()
you can specify only one predictor / only one x. Moreover if you want to visualize the equation of your glm, where you have 5 different predictor, then you need to have a 5-dimensional plot, what is impossible. My solution would be to usepredict()
to predict vs values for different mpg, while other variables are constantant – MikolajM