R ggplot2: add mean and standard deviation in same plot for multiple variables

Question

I have 4 vectors of 16 values each; each values is the mean of an item, and I have the same 16 items across 4 datasets.

I use ggplot2 to plot these means: here a reproducible example.

library("ggplot2")
library("dplyr")    

means <- as.data.frame(cbind(rnorm(16),rnorm(16), rnorm(16), rnorm(16)))
means <- mutate(means, id = rownames(means))
colnames(means)<-c("1", "2", "3", "4", "Symptoms")
means_long <- melt(means, id="Symptoms")
means_long$Symptoms <- as.numeric(means_long$Symptoms)
names(means_long)[2] <- "Datasets"

ggplot(data=means_long, aes(x=Symptoms, y=value, colour=Datasets)) +
      geom_line() +
      geom_point(shape = 21, fill = "white", size = 1.5, stroke = 1) +
      xlab("Symptoms") + ylab("Means") +
      scale_y_continuous() + 
      scale_x_continuous(breaks=c(1:16)) +
      theme_bw() +
      theme(panel.grid.minor=element_blank()) +
      coord_flip()

Now, I have 4 further vectors, which are the standard deviations of the 16 items for the 4 datasets. I want to plot them into the same plot. The data are in the same format as above, so it's virtually the same code.

I want the standard deviations in the same plot as the means, using the same colors but different line types (so dataset 1 mean is red, dataset 1 standard deviation is dashed), and in the best case a legend that differentiates both by dataset (as I have currently) in addition to "mean" vs "standard deviation" for the lines and dashed lines.

Thank you for your help!

jalapic jalapic · Accepted Answer · 2017-03-06T18:20:24

Does this help?

To make it not look super ugly, I made all the random mean values positive, and then just made the example standard deviations negative. The way of plotting the values on the same graph is to feed in the datasets separately to each geom, rather than defining in initial ggplot() function.

Let me know if this isn't what you were thinking:

library("ggplot2")
library("dplyr")    

means <- as.data.frame(abs(cbind(rnorm(16),rnorm(16), rnorm(16), rnorm(16))))
means <- mutate(means, id = rownames(means))
colnames(means)<-c("1", "2", "3", "4", "Symptoms")
means_long <- reshape2::melt(means, id="Symptoms")
means_long$Symptoms <- as.numeric(means_long$Symptoms)
names(means_long)[2] <- "Datasets"


sds_long <- means_long
sds_long$value <- -sds_long$value

ggplot() +
  geom_line(aes(x=Symptoms, y=value, colour=Datasets), lty=1, data=means_long) +
  geom_point(aes(x=Symptoms, y=value, colour=Datasets), data=means_long, shape = 21, fill = "white", size = 1.5, stroke = 1) +
  geom_line(aes(x=Symptoms, y=value, colour=Datasets), lty=2, data=sds_long) +
  geom_point(  aes(x=Symptoms, y=value, colour=Datasets), data=sds_long, shape = 21, fill = "white", size = 1.5, stroke = 1) +
  xlab("Symptoms") + ylab("Means") +
  scale_y_continuous() + 
  scale_x_continuous(breaks=c(1:16)) +
  theme_bw() +
  theme(panel.grid.minor=element_blank()) +
  coord_flip()

To answer your legend query. In short, I think this is very hard because the same mapping aesthetic is being used with both datasets.

However, using the code from this answer - I did the following. The idea is to get the legend from two plots only plotting means/sds and then adding those legends to a version of the plot with no legend. It could be adapted so you position the legends more manually...

### Step 1
# Draw a plot with the colour legend
p1 <- ggplot() +
  geom_line(aes(x=Symptoms, y=value, colour=Datasets), lty=1, data=means_long) +
  geom_point(aes(x=Symptoms, y=value, colour=Datasets), data=means_long, shape = 21, fill = "white", size = 1.5, stroke = 1) +
  scale_color_manual(name = "Means",values=c("red","blue", "green","pink")) +
  coord_flip()+
  theme_bw() +
  theme(panel.grid.minor=element_blank()) +
  theme(legend.position = "top")

# Extract the colour legend - leg1
library(gtable)
leg1 <- gtable_filter(ggplot_gtable(ggplot_build(p1)), "guide-box") 

### Step 2
# Draw a plot with the size legend
p2 <- ggplot() +
    geom_line(aes(x=Symptoms, y=value, color=Datasets), lty=2, data=sds_long) +
    geom_point(  aes(x=Symptoms, y=value, color=Datasets), data=sds_long, shape = 21, fill = "white", size = 1.5, stroke = 1) +
    coord_flip()+
  theme_bw() +
  theme(panel.grid.minor=element_blank()) +
  scale_color_manual(name = "SDs",values=c("red","blue", "green","pink")) 

# Extract the size legend - leg2
leg2 <- gtable_filter(ggplot_gtable(ggplot_build(p2)), "guide-box") 

# Step 3
# Draw a plot with no legends - plot
p3<-ggplot() +
  geom_line(aes(x=Symptoms, y=value, colour=Datasets), lty=1, data=means_long) +
  geom_point(aes(x=Symptoms, y=value, colour=Datasets), data=means_long, shape = 21, fill = "white", size = 1.5, stroke = 1) +
  geom_line(aes(x=Symptoms, y=value, color=Datasets), lty=2, data=sds_long) +
  geom_point(  aes(x=Symptoms, y=value, color=Datasets), data=sds_long, shape = 21, fill = "white", size = 1.5, stroke = 1) +
  xlab("Symptoms") + ylab("Means") +
  scale_y_continuous() + 
  scale_x_continuous(breaks=c(1:16)) +
  theme_bw() +
  theme(panel.grid.minor=element_blank()) +
  coord_flip()+
  scale_color_manual(values=c("red","blue", "green","pink")) +
  theme(legend.position = "none")

### Step 4
# Arrange the three components (plot, leg1, leg2)
# The two legends are positioned outside the plot: 
# one at the top and the other to the side.
library(grid)
plotNew <- arrangeGrob(leg1, p3, 
                       heights = unit.c(leg1$height, unit(1, "npc") - leg1$height), ncol = 1)

plotNew <- arrangeGrob(plotNew, leg2,
                       widths = unit.c(unit(1, "npc") - leg2$width, leg2$width), nrow = 1)

grid.newpage()
grid.draw(plotNew)

R ggplot2: add mean and standard deviation in same plot for multiple variables

2 Answers