1
votes

I have some data recorded over a period of months from 5 treatments (incl. control). I'm using ggplot to plot the data as a time series and have generated a data frame of the means of the raw data and standard error for each date.

I'm trying to plot all five treatments on the same graph and show the error bars with it. I'm able to a). plot one treatment group and show the error bars and b). plot all five treatments but not show the error bars.

Here's my data (I've only included two treatments to keep things tidy here)

       dates   c_mean_am  c_se_am    T1_mean_am  T1_se_am  
1 2017-01-31   284.135   27.43111     228.935     23.39037    
2 2017-02-09   226.944   13.08237     173.241     13.42946    
3 2017-02-23   281.135   15.89709     252.665     20.73417   
4 2017-03-14   265.655   15.29930     238.225     17.47501 
5 2017-04-06   312.785   13.08237     237.485     13.42946 
  • c_mean_am = control means
  • c_se_am = standard error for controls
  • T1_mean_am = Treatment 1 means
  • T1_se_am = standard error for Treatment 1

Here's my code to achieve option a) above

ggplot(summary, aes(x=dates, y=c_mean_am),xlab="Date") + 
    geom_point(shape = 19, size = 2,color="blue") + 
    geom_line(color="blue") + 
    geom_errorbar(aes(x=dates, ymin=c_mean_am-c_se_am, ymax=c_mean_am+c_se_am), color="blue", width=0.25) 

here's the plot

Here's the code for option b) above

sp <- ggplot(summary,aes(dates,y = Cond,color=Treatment)) + 
    geom_line(aes(y = c_mean_am, color = "Control")) + 
    geom_line(aes(y = T1_mean_am, color = "T1")) + 
    geom_point(aes(y = c_mean_am, color = "Control")) + 
    geom_point(aes(y = T1_mean_am, color = "T1"))

sp2<- sp + 
    scale_color_manual(breaks = c("Control", "T1","T2"), values=c("blue", "yellow"))

sp2

here's the plot

How can I get the error bars on the second plot using the same colours as the points and lines?

Thanks

AB

2

2 Answers

1
votes

Transform your data into long-form first:

df <- df %>% 
 gather(mean_type, mean_val, c_mean_am, T1_mean_am) %>% 
 gather(se_type, se_val, c_se_am, T1_se_am)


ggplot(df, aes(dates, mean_val, colour=mean_type)) + 
    geom_line() + 
    geom_point() + 
    geom_errorbar(aes(ymin=mean_val-se_val, ymax=mean_val+se_val))

enter image description here

Edit: explanation for tidyr manipulation

new.dat <- mtcars %>%  # taking mtcars as the starting data.frame
        select(gear, cyl, mpg, qsec) %>% 
          # equivalent to mtcars[, c("gear", "cyl", "mpg", "qsec")]; to simplify the example
        gather(key=type, value=val, gear, cyl) %>% 
          # convert the data into a long form with 64 rows, with new factor column "type" and numeric column "val". "gear" and "cyl" are removed while "mpg" and "qsec" remain

new.dat[c(1:3, 33:35),]

#     mpg  qsec type val
# 1  21.0 16.46 gear   4
# 2  21.0 17.02 gear   4
# 3  22.8 18.61 gear   4
# 33 21.0 16.46  cyl   6
# 34 21.0 17.02  cyl   6
# 35 22.8 18.61  cyl   4

With the long form of data, you can use the new identifier form ("type") for plotting purposes, e.g.

ggplot(new.dat, aes(val, mpg, fill=type)) + 
   geom_col(position="dodge")

enter image description here

The long-format is also useful for plotting on different facet, e.g.

ggplot(new.dat, aes(val, mpg, colour=type)) + 
    geom_point() + 
    facet_wrap(~type) 

enter image description here

1
votes

The accepted answer seems to contain an error in the way the data was gathered (aka pivot_longer in packageVersion("tidyr") >= 1.0.0) which duplicated each point and error bar. The error bars are evident, but if you replace geom_point() with geom_jitter() you'll see both points that correspond to the two error bars. This has caused some confusion to others so I wanted to offer a corrected solution for posterity.

Here's another approach to that pivot that avoids this duplicaton:

# load necessary packages
library(tidyverse)

# create data from question
df <-
  structure(
    list(
      dates = c(
        "2017-01-31",
        "2017-02-09",
        "2017-02-23",
        "2017-03-14",
        "2017-04-06"
      ),
      c_mean_am = c(284.135, 226.944,
                    281.135, 265.655, 312.785),
      c_se_am = c(27.43111, 13.08237, 15.89709,
                  15.2993, 13.08237),
      T1_mean_am = c(228.935, 173.241, 252.665,
                     238.225, 237.485),
      T1_se_am = c(23.39037, 13.42946, 20.73417,
                   17.47501, 13.42946)
    ),
    class = "data.frame",
    row.names = c("1",
                  "2", "3", "4", "5")
  )

# pivot df long and confirm that there's only one value per group per timepoint
df_long <- df %>%
  pivot_longer(
    cols = -dates,
    names_to = c("treatment_group", ".value"),
    names_pattern = "(.*)_(.*_am)"
  ) 

df_long

# # A tibble: 10 x 4
#    dates      treatment_group mean_am se_am
#    <chr>      <chr>             <dbl> <dbl>
#  1 2017-01-31 c                  284.  27.4
#  2 2017-01-31 T1                 229.  23.4
#  3 2017-02-09 c                  227.  13.1
#  4 2017-02-09 T1                 173.  13.4
#  5 2017-02-23 c                  281.  15.9
#  6 2017-02-23 T1                 253.  20.7
#  7 2017-03-14 c                  266.  15.3
#  8 2017-03-14 T1                 238.  17.5
#  9 2017-04-06 c                  313.  13.1
# 10 2017-04-06 T1                 237.  13.4

Now you can plot and get the expected graph with only a single error bar and single point for each group at each timepoint.

df_long %>%
  ggplot(aes(x = dates, y = mean_am, colour = treatment_group)) + 
  geom_line(aes(group = treatment_group)) + 
  geom_point() + 
  geom_errorbar(aes(ymin = mean_am - se_am, ymax = mean_am + se_am))

Which produces this plot:

corrected plot