1
votes

I'm having trouble combining color and linetype guides into a single legend in a plot produced with ggplot2. Either the linetype shows up with all of the linetypes keyed the same way, or it does not show up at all.

My plot includes both a ribbon to show the bulk of the observations, along with lines showing minimum, median, maximum, and sometimes the observations from a single year.

Example code using built in CO2 data set:

library(tidyverse)

myExample <- CO2 %>%  
      group_by(conc) %>%
      summarise(d.min = min(uptake, na.rm= TRUE),
                d.ten = quantile(uptake,probs = .1, na.rm = TRUE),
                d.median = median(uptake, na.rm = TRUE),
                d.ninty = quantile(uptake, probs = .9, na.rm= TRUE),
                d.max = max(uptake, na.rm = TRUE))
myExample <- cbind(myExample, "Qn1"= filter(CO2, Plant == "Qn1")[,5])

plot_plant <- TRUE  # Switch to plot single observation series

myExample %>%
      ggplot(aes(x=conc))+
      geom_ribbon(aes(ymin=d.ten, ymax= d.ninty, fill = "80% of observations"), alpha = .2)+
      geom_line(aes(y=d.min, colour = "c"), linetype = 3, size = .5)+
      geom_line(aes(y=d.median, colour = "e"),linetype = 2, size = .5)+
      geom_line(aes(y=d.max, colour = "a"),linetype = 3, size = .5)+
      {if(plot_plant)geom_line(aes(y=Qn1, color = "f"), linetype = 1,size =.5)}+
      scale_fill_manual("Statistic", values = "blue")+
      scale_color_brewer(palette = "Dark2",name = "",
                         labels = c(
                               a= "Maximum",
                               e= "Median",
                               c= "Minimum",
                               f = current_year
                         ), breaks = c("a","e","c","f"))+
      scale_linetype_manual(name = "")+
      guides(fill= guide_legend(order = 1), color = guide_legend(order = 2), linetype = guide_legend(order = 2))

With plot_plant set to TRUE, the code plots a single observation series, but linetype does not show up at all in the legend: enter image description here

With plot_plant set to FALSE, linetype shows up in the legend, but I cannot see the distinction between the dotted and dashed legend entries: enter image description here

The plot is working as desired, but I would like the linetype distinctions to show up in the legend. Visually, it is more important when I'm plotting the single observation series because the distinction between solid and dashed or dotted is stronger.

Searching for answers, I've seen suggestions to combine the different stats(min, median, max, and the single series) into a single variable and let ggplot determine the linetypes (ex [this post]ggplot2 manually specifying color & linetype - duplicate legend) or make a hash that describes the linetype [for example]How to rename a (combined) legend in ggplot2? but neither of these approaches seems to play well in combination with the ribbon plot.

I tried formatting my data into a long format, which usually works well for ggplot. This worked if I plotted all of the statistics as line geometry, but couldn't get the ribbon to work like I wanted, and overlaying a single observation series seemed like it needed to be stored in a different data table.

1

1 Answers

0
votes

As you noted, ggplot loves long format data. So I recommend sticking with that.

Here I generate some made up data:

library(tibble)
library(dplyr)
library(ggplot2)
library(tidyr)

set.seed(42)

tibble(x = rep(1:10, each = 10), 
       y = unlist(lapply(1:10, function(x) rnorm(10, x)))) -> tbl_long

which looks like this:

# A tibble: 100 x 2
       x     y
   <int> <dbl>
 1     1 2.37 
 2     1 0.435
 3     1 1.36 
 4     1 1.63 
 5     1 1.40 
 6     1 0.894
 7     1 2.51 
 8     1 0.905
 9     1 3.02 
10     1 0.937
# ... with 90 more rows

Then I group_by(x) and calculate quantiles of interest for y in each group:

tbl_long %>% 
  group_by(x) %>% 
  mutate(q_0.0 = quantile(y, probs = 0.0), 
         q_0.1 = quantile(y, probs = 0.1),
         q_0.5 = quantile(y, probs = 0.5), 
         q_0.9 = quantile(y, probs = 0.9), 
         q_1.0 = quantile(y, probs = 1.0)) -> tbl_long_and_wide

and that looks like:

# A tibble: 100 x 7
# Groups:   x [10]
       x     y q_0.0 q_0.1 q_0.5 q_0.9 q_1.0
   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1 2.37  0.435 0.848  1.38  2.56  3.02
 2     1 0.435 0.435 0.848  1.38  2.56  3.02
 3     1 1.36  0.435 0.848  1.38  2.56  3.02
 4     1 1.63  0.435 0.848  1.38  2.56  3.02
 5     1 1.40  0.435 0.848  1.38  2.56  3.02
 6     1 0.894 0.435 0.848  1.38  2.56  3.02
 7     1 2.51  0.435 0.848  1.38  2.56  3.02
 8     1 0.905 0.435 0.848  1.38  2.56  3.02
 9     1 3.02  0.435 0.848  1.38  2.56  3.02
10     1 0.937 0.435 0.848  1.38  2.56  3.02
# ... with 90 more rows

Then I gather up all the columns except for x, y, and the 10- and 90-percentile variables into two variables: key and value. The new key variable takes on the names of the old variables from which each value came from. The other variables are just copied down as needed.

tbl_long_and_wide %>% 
  gather(key, value, -x, -y, -q_0.1, -q_0.9) -> tbl_super_long

and that looks like:

# A tibble: 300 x 6
# Groups:   x [10]
       x     y q_0.1 q_0.9 key   value
   <int> <dbl> <dbl> <dbl> <chr> <dbl>
 1     1 2.37  0.848  2.56 q_0.0 0.435
 2     1 0.435 0.848  2.56 q_0.0 0.435
 3     1 1.36  0.848  2.56 q_0.0 0.435
 4     1 1.63  0.848  2.56 q_0.0 0.435
 5     1 1.40  0.848  2.56 q_0.0 0.435
 6     1 0.894 0.848  2.56 q_0.0 0.435
 7     1 2.51  0.848  2.56 q_0.0 0.435
 8     1 0.905 0.848  2.56 q_0.0 0.435
 9     1 3.02  0.848  2.56 q_0.0 0.435
10     1 0.937 0.848  2.56 q_0.0 0.435
# ... with 290 more rows

This format will allow you to use both geom_ribbon() and geom_smooth() like you want to do because the variables for the lines are contained in value and grouped by key whereas the variables to be mapped to ymin and ymax are separate from value and are all the same within each x group.

tbl_super_long %>% 
  ggplot() + 
  geom_ribbon(aes(x = x, 
                  ymin = q_0.1, 
                  ymax = q_0.9, 
                  fill = "80% of observations"), 
              alpha = 0.2) + 
  geom_line(aes(x = x, 
                y = value, 
                color = key, 
                linetype = key)) + 
  scale_fill_manual(name = element_text("Statistic"), 
                    guide = guide_legend(order = 1), 
                    values = viridisLite::viridis(1)) + 
  scale_color_manual(name = element_blank(), 
                     labels = c("Minimum", "Median", "Maximum"), 
                     guide = guide_legend(reverse = TRUE, order = 2), 
                     values = viridisLite::viridis(3)) + 
  scale_linetype_manual(name = element_blank(), 
                        labels = c("Minimum", "Median", "Maximum"), 
                        guide = guide_legend(reverse = TRUE, order = 2), 
                        values = c("dotted", "dashed", "solid")) + 
  labs(x = "x", y = "y")

example plot

This data format with the long but grouped x and y variables plus the independent but repeated ymin, and xmin variables will allow you to use both geom_ribbon() and geom_smooth() and allow the linetypes to show up properly in the legend.