1
votes

I'm trying to make a plot that will represent 2 measurements(prr and ebgm) for different adverse reactions of different drugs grouped by age category like so:

library(ggplot2)
strata <- factor(c("Neonates", "Infants", "Children", "Adolescents", "Pediatrics"), levels=c("Neonates", "Infants", "Children", "Adolescents", "Pediatrics"), order=T)
Data  <- data.frame(
                strata = sample(strata, 200, replace=T),
                drug=sample(c("ibuprofen", "clarithromycin", "fluticasone"), 200, replace=T), #20 de medicamente
                reaction=sample(c("Liver Injury", "Sepsis", "Acute renal failure", "Anaphylaxis"), 200, replace=T),
                measurement=sample(c("prr", "EBGM"), 200, replace=T),
                value_measurement=sample(runif(16), 200, replace=T),
                lower_CI=sample(runif(6), 200, replace=T),
                upper_CI=sample(runif(5), 200, replace=T)
                )

g <- ggplot(Data, aes(x=strata, y=value_measurement, fill=measurement, group=measurement))+
    geom_histogram(stat="identity", position="dodge")+
    facet_wrap(~reaction)+
    geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI), position="dodge", stat="identity")

ggsave(file="meh.png", plot=g)

The upper and lower CI are the confidence interval limit of the measurement. Given that I for each measurement I have a confidence interval I want the proper histogram to have the corresponding confidence interval, but what I get is s follows.

Graph: Graph Snapshot

Any ideas how to place those nasty conf intervals properly? Thank you!

Later edit: in the original data for a given drug I have many rows each containing an adverse reaction, the age category and each of these categories has 2 measurements: prr or EBGM and the corresponding confidence interval. This is not reflected in the data simulation.

1

1 Answers

2
votes

The problem is that each of your bars is really multiple bars plotted over each other, because you have more than one row of data for each combination of reaction, strata, and measurement. (You're getting multiple error bars for the same reason.)

You can see this in the code below, where I've changed geom_histogram to geom_bar and added alpha=0.3 and colour="grey40" to show the multiple overlapping bars. I've also commented out the error bars.

ggplot(Data, aes(x=strata, y=value_measurement, fill=measurement, group=measurement)) +
  geom_bar(stat="identity", position="dodge", alpha=0.3, colour="grey40") +
  facet_wrap(~reaction) #+
#   geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI), 
#                 position="dodge", stat="identity")

enter image description here

You can fix this by adding another column to your data that adds a grouping category by which you can separate these bars. For example, in the code below we add a new column called count that just assigns numbers 1 through n for each row of data within each combination of reaction and strata. We sort by measurement so that each measurement type will be kept together in the count sequence.

library(dplyr) 

Data = Data %>% group_by(reaction, strata) %>%
  arrange(measurement) %>%
  mutate(count = 1:n()) 

Now plot the data:

ggplot(Data, aes(x=strata, y=value_measurement, 
                 fill=measurement, group=count)) +
  geom_bar(stat="identity", position=position_dodge(0.7), width=0.6) +
  facet_wrap(~reaction, ncol=1) +
  geom_errorbar(aes(x=strata, ymax=upper_CI, ymin=lower_CI, group=count), 
                position=position_dodge(0.7), stat="identity", width=0.3)

enter image description here

Now you can see the separate bars, along with their error bars (which are weird, but only because they're fake data).