What's a good way to calculate and add errorbars to a ggplot2 histogram?

Question

The following command generates a simple histogram:

g<- ggplot(data = mtcars, aes(x = factor(carb) )) + geom_histogram()

Usually I add errorbars to my plots like this:

g+stat_summary(fun.data="mean_cl_boot",geom="errorbar",conf.int=.95)

But that doesn't work with a histogram ("Error: geom_errorbar requires the following missing aesthetics: ymin, ymax "), I think because the y variable is not explicit- counts are automatically calculated by geom_histogram, so one doesn't declare the y variable.

Are we unable to use geom_histogram and instead must first calculate the y quantity (counts) ourselves, and then specify it as the y variable with a call to geom_bar?

The 95% confidence interval on the number of observations in the bin would be nice. The best formula for such a confidence interval would be more of an issue for stats.stackexchange.com, but here's a start: suchideas.com/articles/maths/applied/histogram-errors . Here I'm asking for the code to add an error bar of any kind. — Alex Holcombe

Alex Holcombe Alex Holcombe · Accepted Answer · 2013-05-28T21:27:13

It seems that indeed one cannot use geom_histogram and instead we must calculate the counts (bar heights) and confidence interval limits manually. First, to calculate the counts:

library(plyr)
mtcars_counts <- ddply(mtcars, .(carb), function(x) data.frame(count=nrow(x)))

The remaining problem is calculating the confidence interval for a binomial proportion, here the count divided by the total number of cases in the data set. A variety of formulae have been proposed in the literature. Here we will use the Agresti & Coull (1998) method as implemented in the PropCIs library.

library(PropCIs)
numTotTrials <- sum(mtcars_counts$count)

# Create a CI function for use with ddply and based on our total number of cases.
makeAdd4CIforThisHist <- function(totNumCases,conf.int) {
  add4CIforThisHist <- function(df) {
     CIstuff<- add4ci(df$count,totNumCases,conf.int)
     data.frame( ymin= totNumCases*CIstuff$conf.int[1], ymax = totNumCases*CIstuff$conf.int[2] ) 
  }
  return (add4CIforThisHist)
}

calcCI <- makeAdd4CIforThisHist(numTotTrials,.95)

limits<- ddply(mtcars_counts,.(carb),calcCI) #calculate the CI min,max for each bar

mtcars_counts <- merge(mtcars_counts,limits) #combine the counts dataframe with the CIs

g<-ggplot(data =mtcars_counts, aes(x=carb,y=count,ymin=ymin,ymax=ymax)) + geom_bar(stat="identity",fill="grey")
g+geom_errorbar()

resulting graph

What's a good way to calculate and add errorbars to a ggplot2 histogram?

2 Answers