Proper display of confidence interval in R using ggplot

Question

I'm trying to make a plot that will represent 2 measurements(prr and ebgm) for different adverse reactions of different drugs grouped by age category like so:

library(ggplot2)
strata <- factor(c("Neonates", "Infants", "Children", "Adolescents", "Pediatrics"), levels=c("Neonates", "Infants", "Children", "Adolescents", "Pediatrics"), order=T)
Data  <- data.frame(
                strata = sample(strata, 200, replace=T),
                drug=sample(c("ibuprofen", "clarithromycin", "fluticasone"), 200, replace=T), #20 de medicamente
                reaction=sample(c("Liver Injury", "Sepsis", "Acute renal failure", "Anaphylaxis"), 200, replace=T),
                measurement=sample(c("prr", "EBGM"), 200, replace=T),
                value_measurement=sample(runif(16), 200, replace=T),
                lower_CI=sample(runif(6), 200, replace=T),
                upper_CI=sample(runif(5), 200, replace=T)
                )

g <- ggplot(Data, aes(x=strata, y=value_measurement, fill=measurement, group=measurement))+
    geom_histogram(stat="identity", position="dodge")+
    facet_wrap(~reaction)+
    geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI), position="dodge", stat="identity")

ggsave(file="meh.png", plot=g)

The upper and lower CI are the confidence interval limit of the measurement. Given that I for each measurement I have a confidence interval I want the proper histogram to have the corresponding confidence interval, but what I get is s follows.

Graph:

Any ideas how to place those nasty conf intervals properly? Thank you!

Later edit: in the original data for a given drug I have many rows each containing an adverse reaction, the age category and each of these categories has 2 measurements: prr or EBGM and the corresponding confidence interval. This is not reflected in the data simulation.

eipi10 eipi10 · Accepted Answer · 2015-12-06T06:51:02

The problem is that each of your bars is really multiple bars plotted over each other, because you have more than one row of data for each combination of reaction, strata, and measurement. (You're getting multiple error bars for the same reason.)

You can see this in the code below, where I've changed geom_histogram to geom_bar and added alpha=0.3 and colour="grey40" to show the multiple overlapping bars. I've also commented out the error bars.

ggplot(Data, aes(x=strata, y=value_measurement, fill=measurement, group=measurement)) +
  geom_bar(stat="identity", position="dodge", alpha=0.3, colour="grey40") +
  facet_wrap(~reaction) #+
#   geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI), 
#                 position="dodge", stat="identity")

You can fix this by adding another column to your data that adds a grouping category by which you can separate these bars. For example, in the code below we add a new column called count that just assigns numbers 1 through n for each row of data within each combination of reaction and strata. We sort by measurement so that each measurement type will be kept together in the count sequence.

library(dplyr) 

Data = Data %>% group_by(reaction, strata) %>%
  arrange(measurement) %>%
  mutate(count = 1:n())

Now plot the data:

ggplot(Data, aes(x=strata, y=value_measurement, 
                 fill=measurement, group=count)) +
  geom_bar(stat="identity", position=position_dodge(0.7), width=0.6) +
  facet_wrap(~reaction, ncol=1) +
  geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI, group=count), 
                position=position_dodge(0.7), stat="identity", width=0.3)

Now you can see the separate bars, along with their error bars (which are weird, but only because they're fake data).

Proper display of confidence interval in R using ggplot

1 Answers