bar plot grouped multi-column data with confidence intervals with ggplot2

Question

I'm trying to create a bar plot of grouped multi column data and to add confidence intervals to each bar. So far, I have done almost all tasks with the help of several entries in different blogs and platforms like stackoverflow.

My data sgr_sum_v3 looks like this:

      treatment mean_C16_0 sd_C16_0 mean_C18_0 sd_C18_0 mean_LIN sd_LIN mean_ALA sd_ALA
      ALA    92500.0   1492.0    14406.7   1291.5    740.2   77.7   3399.2  436.4
      ALA+ARA    71538.3   3159.0    14088.7   1101.0    582.3   91.5   2089.3  439.6
      ALA+EPA    82324.6   2653.3    10745.2   1244.2    658.3   19.2   2629.3  134.7
      ALA+EPA+LIN+ARA    68422.9   2097.2    10818.2    721.8    969.9   24.0   2154.0  124.5
      ALA+LIN    87489.0   3150.6    15951.9    888.2   1173.0  279.1   2010.6  519.4
      ARA    65571.7   2635.6    11174.7   1851.9    589.0    7.0   1640.9  163.7
      control   107313.4  10828.0    22087.0   6217.7    783.8   38.6   2417.5   59.2
      EPA    76621.3   1863.7     9947.7    156.4    654.6   31.0   1946.8   56.6
      EPA+ARA    70312.3   2187.3    10896.8    148.6    716.8   24.4   2144.0  251.4
      EPA+LIN    79388.5   4866.9    10080.4    613.3   1449.9   41.7   1862.9  235.4
      LIN    87398.4   2213.9    11961.6    798.8   1909.3  100.2   1939.1   82.5
      LIN+ARA    71437.1   1220.1    12612.0   1190.8   1134.6  333.6   1628.6  508.1
      Scen   138102.2  22228.4    24893.0   1259.9   4259.4  612.0  23417.2 3946.5

Basically different treatments with mean values and standard deviations of some measured values.

To get the plot running I basically adapted the code from this post: Creating grouped bar-plot of multi-column data in R from joran for the multi column problem and the code from this post: Grouped barplot in R with error bars from Colonel Beauvel for the confidence intervals.

Here is my code:

library(reshape2)    
dfm <- data.frame()
dfm <- melt(sgr_sum_v3[,c('treatment', 'mean_ALA', 'mean_LIN')], id.vars = 1)

 ggplot(data=dfm, aes(x=treatment, y = value, fill = variable))+
   geom_bar(stat = "identity", position = "dodge")+
   geom_errorbar(aes(ymin = value - 1000, ymax = value + 1000), width = .2, position = position_dodge(.9))

Now my problem is, that as the multi-column problem is solved by the melt function, I don't have my standard deviations to get real errorbars (so far I just insert 1000 to see if it works).

Do you have suggestions how to solve this, or even to get the multi column plot running with the original data (without melting) which would make the cf problem pretty straight forward?

Thanks in advance for your help :)

It sounds like you need to reshape your data so you end up with a column of means and a column of SD for each variable. See here for some options — aosmith
yes, indeed. This would be a solution. Reading your link.... — Maki
Yep. Thanks for that link. It didn't help me directly but it got me to an idea xD I just melted the data twice... one with the means an one with the sds. Then I just added the sd column to the other df. That easy that I could've come to this solution earlier. Sometimes one should make a break ;) — Maki
@Maki - how do you call the standard deviations after melting the two DFs? Could you paste your final code? — dende85
@dende85 - I added a comprehensive answer that hopefully answers your question. — Maki

Maki Maki · Accepted Answer · 2020-10-27T10:42:55

Eventhough, my question is already pretty old and solved in the meanwhile, I will answer it in a more comprehensive way, as @dende85 asked currently for the complete code. The following code is not exactly with the data above, but I created it for a small R-lecture for my students. However, I'm pretty sure, that this might be handled easier. So here's the answer:

First, I create two data sets. One for mean values and one for sd. In this case I only chose a subset with the [1:4]-thing

my_bar_data_mean <- data.frame(treatment = levels(my_data$treatment)[1:4])
my_bar_data_sd <- data.frame(treatment = levels(my_data$treatment)[1:4])

Then I used aggregate() to calculate mean and sd for all groups for all (in this case 3) parameters of interest:

#BL
my_bar_data_mean$BL_mean <- aggregate(my_data, 
                              by = list(my_data$treatment), 
                              FUN = mean, 
                              na.rm = TRUE)[, 8]
my_bar_data_sd$BL_sd <- aggregate(my_data, 
                            by = list(my_data$treatment), 
                            FUN = sd, 
                            na.rm = TRUE)[, 8]
# BW
my_bar_data_mean$BW_mean <- aggregate(my_data, 
                                 by = list(my_data$treatment), 
                                 FUN = mean, 
                                 na.rm = TRUE)[, 9]
my_bar_data_sd$BW_sd <- aggregate(my_data, 
                               by = list(my_data$treatment), 
                               FUN = sd, 
                               na.rm = TRUE)[, 9]
# SL
my_bar_data_mean$SL_mean <- aggregate(my_data, 
                                 by = list(my_data$treatment), 
                                 FUN = mean, 
                                 na.rm = TRUE)[, 10]
my_bar_data_sd$SL_sd <- aggregate(my_data, 
                               by = list(my_data$treatment), 
                               FUN = sd, 
                               na.rm = TRUE)[, 10]

Now, we need to reshape the data.frame. Therefore, we need some packages:

library(Hmisc)
library(car)
library(reshape2)

We create a new data.frame and reshape our data with the help of the melt()-function. Note that we still have two data.frames: one for mean and one for sd:

dfm <- data.frame()
dfm <- melt(my_bar_data_mean)
temp <- data.frame()
temp <- melt(my_bar_data_sd)

Now we can see, that our variable are gathered vertically. We just have to add the value of the temp data.frame as a new column called sd to the first data.frame:

dfm$sd <- temp$value

Now, we just have to plot everything:

ggplot(dfm, aes(variable, value, fill=treatment))+
  geom_bar(stat="identity", position = "dodge")+ 
  theme_classic() +
  geom_errorbar(aes(ymin = value - sd, ymax = value + sd), width=0.4, position = position_dodge(.9))

You can simply add the error bars using geom_errorbar and using the columns value and sd for min and max of your whiskers. Don't forget to set position = position_dodge(.9) for the error bars as well.

You can also simply change whether to plot your response variables as dodged bars and split them for treatment or vice versa by simply exchanging variable and value in the first line (ggplot(aes())).

I hope this hepls.

bar plot grouped multi-column data with confidence intervals with ggplot2

1 Answers