Setting up ggplot for a logistic regression with one predictor and looping through multiple outcomes (or columns)

Question

I am a novice at R and have a ggplot related question. Below is a dummy data frame with one column containing the predictor (xvar) and multiple columns of dichotomous outcomes (yvar1, yvar2, yvar3).

df <- data.frame("xvar"=c(0,100,200,300,400,500,600,1000),"yvar1"= c(0,0,0,0,0,0,1,1),"yvar2"=c(0,0,1,1,1,1,1,1),"yvar3"=c(0,0,1,1,0,1,1,1))

I have created a for loop to run a logistic regression for each yvar against the predictor xvar. I am able to successfully plot the regression for each yvar. Please ignore the regression warnings (this is a dummy dataset)

for (i in 2:4) {

  logr.yvar <- glm(df[,names(df[i])] ~ xvar, data=df, family=binomial(link="logit"))
  print(logr.yvar)

  plot(df$xvar, df[,i])
  curve(predict(logr.yvar, data.frame(xvar=x), type="response"), add=TRUE) 

}

Instead of using the base plot function, I would like to switch to ggplot2. I am currently able to generate ggplots for individual regressions:

ggplot(df, aes(x=xvar, y=yvar1)) + geom_point() + 
  stat_smooth(method="glm", family="binomial", se=TRUE)

How can I set up looping using ggplot2?

Is the following code my best option plotHistFunc <- function(x, na.rm = TRUE, ...) { nm <- names(x) for (i in seq_along(nm)[2:4]) { print(ggplot(x,aes_string(x = nm[1], y=nm[i])) + geom_point() + stat_smooth(method="glm", family="binomial", se=TRUE)) } } plotHistFunc(df) — ansonab

shadow shadow · Accepted Answer · 2015-03-02T13:26:19

If you really want to loop, you could use lapply.

p <- lapply(names(df)[-1], function(nm){
  ggplot(df, aes_string(x="xvar", y=nm)) + geom_point() + 
    stat_smooth(method="glm", family="binomial", se=TRUE)
})
print(p)

However, I suspect that reshaping your data and displaying all the graphs together may be better.

# reshaping data
require(reshape2)
df.melt <- melt(df, id.var='xvar')
# first variation, using facets 
ggplot(df.melt, aes(xvar, value)) + 
  geom_point() + 
  stat_smooth(method="glm", family="binomial", se=TRUE) +
  facet_grid(variable~.)
# second variation using colors
ggplot(df.melt, aes(xvar, value)) + 
  geom_point() + 
  stat_smooth(aes(color = variable, fill = variable), 
              method="glm", family="binomial", se=TRUE, size = 1.2)

Setting up ggplot for a logistic regression with one predictor and looping through multiple outcomes (or columns)

1 Answers