ggplot using second data source for error bars fails

Question

This is a follow-on to a previous question about getting some custom error bars.

The look of the plot is how I need it, so don't worry about commenting in solely in regards to that (happy to hear opinions attached to other help though)
Because these plots are generated in a loop, and the error bars are actually only added if a condition is met, I cant simply merge all the data up front, so assume for the purpose of this exercise the plot data and errorbar data are from different dfs.

I have a ggplot, to which I attempt to add some error bars using a different dataframe. When I call the plot, it says that it cannot find the y values from the parent plot, even though I'm just trying to add error bars using new data. I know this has to be a syntax error but I am stumped...

First lets generate data and the plot

library(ggplot2)
library(scales)

# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
                       area = c("first","second","third","first","second","third"),
                       group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))

data.2014 = data.frame(score = c(-30,40,-15),
                       area = c("first","second","third"),
                       group = c("Findings","Findings","Findings"))

# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50) 
limits =c(-70,70)

# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  coord_flip() +
  scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, 
                     breaks = breaks.major)

Calling the plot (c) produces a nice plot as expected, now lets set up the error bars and attempt to add them as a new layer in the plot "c"

# get the error bar values
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"), 
               suffixes = c(".2015", ".2014"))
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"

#add error bars to original plot
c <- c+
  geom_errorbar(data=alldat, aes(ymin = plotscore, ymax = score.2014, color = direction), 
                position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE)

When I call c now, I get

"Error in eval(expr, envir, enclos) : object 'score' not found"

Why does it look for data.2015$score when I just want it to overlay the geom_errorbar using the second alldat dataframe?

EDIT* I've tried to specify the ymin/ymax values for the error bars using alldata$plotscore and alldat$score.2014 (which I am sure is bad practice), it plots, but the bars are in the wrong positions/out of order with the plot (e.g. swapped around, on the benchmark bars instead, etc.)

Looks like geom_errobar is inheriting the y aesthetic from the global aesthetic you set in ggplot and the variable score is not in the second dataset . Either name that column score in the new dataset (maybe play with the suffixes argument in merge) or use y = score.2015 in the aesthetics for geom_errorbar. — aosmith
@aosmith care to add that as a solution so I can mark my question answered? It worked for me, by specifying both "y" and "fill" in the aes geom_errorbar as values from the new dataframe (did y, then it also asked for fill). Others might come here wondering how to solve it, but also might wonder how to circumvent that issue if they don't have matching data in the new dataframe (i.e. only the bar data, nothing to replace y and fill with to match the parent) — Alex

aosmith aosmith · Accepted Answer · 2015-08-20T15:41:48

In my experience, this error about some variable not being found tells me that R went to look in a data.frame for a variable and it wasn't there. Sometimes the solution is as simple as fixing a typo, but in your case the score variable isn't in the dataset you used to make your error bars.

names(alldat)
[1] "area"       "group"      "score.2015" "score.2014" "plotscore"  "direction"

The y variable is a required aesthetic for geom_errorbar. Because you set a y variable globally within ggplot, the other geoms inherit the global y unless you specifically map it to a different variable. In the current dataset, you'll need map y to the 2015 score variable.

geom_errorbar(data=alldat, aes(y = score.2015, ymin = plotscore, 
                               ymax = score.2014, color = direction), 
              position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE)

In your comment you indicated you also had to add fill to geom_errobar, as well, but I didn't find that necessary when I ran the code (you can see above that group is a variable in the second dataset in the example you give).

The other option would be to make sure the 2015 score variable is still named score after merging. This can be done by changing the suffixes argument in in merge. Then score will be in the second dataset and you won't have to set your y variable in geom_errorbar.

alldat2 = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"), 
            suffixes = c("", ".2014"))
...
names(alldat2)
[1] "area"       "group"      "score"      "score.2014" "plotscore"  "direction"

ggplot using second data source for error bars fails

1 Answers