3
votes

Dear Stackoverflow users,

I would like to draw a grouped barplot with three independent variables with error bars. I based my graph on an example on Stacked Overflow (stacked bars within grouped bars), using ggplot with geom_bar. When I add the geom_errorbar according to examples of the help pages, I get the following error: Error in if (empty(data)) { : missing value where TRUE/FALSE needed

This is the script I use:

treatment<-rep(c(rep(c(1),8),rep(c(2),8)),2)
origin<-rep(c("A","B"),16)
time<-c(rep(c(5),16),rep(c(10),16))
sulfide<-c(0,10,5,8,9,6,16,18,20,25,50,46,17,58,39,43,20,25,50,46,17,58,39,43,100,120,103,104,150,160,200,180)

Reed<-data.frame(treatment,origin,time,sulfide)

# specify factor types
Reed$treatment<-as.factor(Reed$treatment)
Reed$origin<-as.character(Reed$origin)
Reed$time<-as.factor(Reed$time)

library(ggplot2)
library(scales)

#draw plot
ggplot() +geom_bar(data=Reed, aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +theme_bw() + facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time)")

This is how I added error bars:

ErrorBars <- function(x, y, upper, lower=upper, length=0.03,...{if(length(x) != length(y) | length(y) !=length(lower) | length(lower) != length(upper))stop("vectors must be same length")arrows(x,y+upper, x, y-lower, angle=90, code=3, length=length, ...)}#function for errorbars

SE<- function(x) sqrt(var(x,na.rm=TRUE)/length(na.omit(x))) #function for SE

Reed$trt<- paste(Reed$treatment,Reed$origin,sep="")#combine treatment and origin to a column 
mean_Reed<-data.frame(tapply(Reed$sulfide,list(Reed$trt,Reed$time),mean,na.rm=TRUE)) #mean
SE_Reed<-data.frame(tapply(Reed$sulfide,list(Reed$trt, Reed$time),SE)) # SE 

limits <- aes(ymax = mean_Reed + SE_Reed, ymin=mean_Reed - SE_Reed)# Define the top and bottom of the errorbars

#plot with error bars:
ggplot() +geom_bar(data=Reed, aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +theme_bw() + facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time)"+ geom_errorbar(limits, width=.2,position="dodge") 

I really can't find what I'm doing wrong. I hope you can help me:)

2

2 Answers

2
votes

Leaving aside the issue of error bars for the moment, there's a much more serious problem with your plot. You have 2 values each of treatment, time, and origin, for a total of 8 combinations, but 32 values of sulfide - so there are 4 values of sulfide for each combination. When you plot this using, e.g.,

ggplot(data=Reed) +
  geom_bar(aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +
  facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")

you are plotting bars for all four sulfide values on top of each other all in the same color. This has the effect of displaying only the maximum value. It's a little hard to believe this is what you intended, and even if you did there's a better way to do that. For instance, if you want to plot the mean value of sulfide for each combination of factors, you can do it this way.

ggp <- ggplot(data=Reed, aes(y = sulfide, x = as.factor(treatment), group=origin)) +
  geom_bar(aes(fill=origin), stat="summary", fun.y=mean, position="dodge") +
  theme_bw() + 
  facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time")
ggp

This uses stat="summary" to automatically summarize the result using the aggregating function mean (fun.y=mean).

As similar approach can be used to very simply add the error bars:

se <- function(y) sd(y)/length(y)   # to calculate standard error in the mean
ggp+stat_summary(geom="errorbar",position=position_dodge(width=0.85),
                 fun.data=function(y)c(ymin=mean(y)-se(y),ymax=mean(y)+se(y)), width=0.1)

Notice that there is no need to aggregate the data externally - ggplot does it for you.

Finally, this approach lends itself to the use of many built-in functions for generating confidence limits with more statistical rigor.

ggp+stat_summary(fun.data=mean_cl_normal, conf.int=0.95,
                 geom="errorbar",position=position_dodge(width=0.85), width=0.1)

So here we use the ggplot built-in function mean_cl_normal to calculate 95% confidence limits on the mean assuming the data follows a normal distribution (and that, hence, the means will follow a t-distribution). We use the argument conf.int=... to specify the desired confidence interval, but the default is 0.95 so it really wasn't necessary in this example.

There are several other functions of this type: see the documentation and links therein for an explanation.

2
votes

If you want to build your error bars by making a summary dataset, you just need to get that dataset in the correct format. There are lots of options for this; I will use dplyr. Notice I keep all the grouping variables from the plot in this dataset in a "tidy" format, with each variable in a separate column.

library(dplyr)
meandat = Reed %>% 
    group_by(treatment, time, origin) %>%
    summarise(mean = mean(sulfide, na.rm = TRUE), se = SE(sulfide))

Source: local data frame [8 x 5]
Groups: treatment, time [?]

  treatment   time origin   mean        se
     (fctr) (fctr)  (chr)  (dbl)     (dbl)
1         1      5      A   7.50  3.378856
2         1      5      B  10.50  2.629956
3         1     10      A  31.50  7.858117
4         1     10      B  43.00  6.819091
5         2      5      A  31.50  7.858117
6         2      5      B  43.00  6.819091
7         2     10      A 138.25 23.552689
8         2     10      B 141.00 17.540429

Now error bars can be added via geom_errorbar. You'll see I set the aesthetics globally within ggplot to save myself having to re-type some of these, but you can change this as you want. I use position_dodge to get the error bars placed correctly over each bar.

ggplot(data = Reed, aes(y = sulfide, x = treatment, fill=origin)) +
    geom_bar(stat="identity", position="dodge") +
    theme_bw() + 
    facet_grid( ~ time)+
    xlab("treatment") +
    ylab("Sulfide")+
    ggtitle("Time")+ 
    geom_errorbar(data = meandat, aes(ymin = mean - se, ymax = mean + se, y = mean), 
                position = position_dodge(width = .9))

enter image description here

You can actually do all of this via stat_summary, rather than calculating the summary statistics "by hand". An example is here. The code would look like so, and gives the same plot as above.

ggplot(data = Reed, aes(y = sulfide, x = treatment, fill=origin)) +
    geom_bar(stat="identity",position="dodge") +
    theme_bw() + 
    facet_grid( ~ time) +
    xlab("treatment") +
    ylab("Sulfide") +
    ggtitle("Time") + 
    stat_summary(geom = "errorbar", fun.data = mean_cl_normal, mult = 1, 
               position = position_dodge(width = .9))

I've been using the development version of ggplot2, ggplot2_1.0.1.9003, and found that I needed to add stat_summary function arguments via fun.args. This would look like fun.args = list(mult = 1) to get error bars of 1 standard error.