0
votes

I have a data set that uses two factor variables to create a bar graph in ggplot. A factor variable of 5 levels provides the bar graph distinctions, while the factor variable of two levels provides the mean/average of each condition. An example graph looks like this : enter image description here The code to produce this is

plot <- ggplot(data = testdf, aes(x = condition, fill = DV)) + geom_bar(position = "fill", na.rm = TRUE) + theme_bw()

I would like to add error bars onto each of the bars, using 95% confidence intervals.

I've tried converting the DV variable to a numeric 1 or 0 and then analyzing using summarySE() to get CIs for each bar, like so:

se_test <- summarySE(testdf, measurevar = "numericDV", groupvars = c("condition"))

. I then change the ggplot function to read:

plot <- ggplot(data = testdf, aes(x = condition, fill = DV)) +
geom_bar(position = "fill", na.rm = TRUE) + theme_bw() +
geom_errorbar(aes(ymin = (DV - se_test$ci), ymax = (DV - se_test$ci)))

This leads to an error for the - and + to not be meaningful for factors. So the data is still being considered as a factor. Is there a way to keep this graph, while implementing the CI error bars? I'd like for the average displayed by the bars to act as the middle of the confidence intervals, while keeping the aesthetics of the fill conditions.

Thanks in advance.

Sample Data W/ Numeric DV:

testdf <- structure(list(condition = structure(c(4L, 3L, 3L, 2L, 5L, 1L, 
5L, 4L, 4L, 3L, 4L, 1L, 2L, 5L, 5L, 1L, 4L, 3L, 2L, 5L, 4L, 3L, 
3L, 2L, 2L, 3L, 2L, 3L, 5L, 1L, 3L, 3L, 3L, 4L, 4L, 1L, 4L, 2L, 
4L, 3L), .Label = c("0", "1", "2", "3", "4"), class = "factor"), 
    DV = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
    1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 
    2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L
    ), .Label = c("No", "Yes"), class = "factor"), numericDV = c(0, 
    1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 
    0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 
    1)), row.names = c(2L, 4L, 9L, 12L, 16L, 17L, 22L, 24L, 30L, 
31L, 35L, 40L, 41L, 42L, 43L, 45L, 46L, 47L, 49L, 50L, 52L, 57L, 
64L, 66L, 67L, 73L, 76L, 77L, 78L, 79L, 84L, 86L, 90L, 100L, 
103L, 105L, 107L, 108L, 112L, 113L), class = "data.frame")
2

2 Answers

1
votes

ggplot2 let's you combine several data frames in one 'screen-space' using just the variable names and values - that is you can add a layer to your plot which has a different data source.

testdf %>% 
  ggplot(aes(x = condition)) + 
  geom_bar(aes(fill = DV), position = "fill", na.rm = TRUE) + 
  geom_errorbar(aes(
    ymin = numericDV - ci,
    ymax = numericDV + ci), 
    data = Rmisc::summarySE(testdf, measurevar = "numericDV", groupvars = "condition")) +
  theme_bw()

I'm not sure if the result looks really nice with the bars exceeding the 0-1 interval, but numerically it looks like what you wanted. I moved the fill aesthetic to the geom_bar layer, as DV is missing in the summarySE output.

filled bar chart with error bars

0
votes

you can try

library(tidyverse)
se_test <- Rmisc::summarySE(testdf, measurevar = "numericDV", groupvars = c("condition"))
testdf %>% 
  count(condition, DV)  %>%
ggplot(aes(condition, n)) +
   geom_col(aes( fill =DV)) +
    geom_errorbar(data=se_test, aes(y=N, ymin = N - ci, ymax = N + ci))

enter image description here