2
votes

enter image description hereI have a plot which has boxplots (geom_boxplot) overlaid with some marker points (geom_point). By default, the legend is shown all mashed together, but I would like to split it out so that each geom_point item is listed separately on the legend.

library(tidyverse)  # data manipulation etc
library(scales)     # for log scales
library(viridis)    # for colour-blind friendly palettes

PlotData_HIL %>%
  ggplot(aes(Analyte, Concentration, fill = Analyte)) +                                  # Plot analyte vs Concentration, with a different colpour per analyte
  geom_boxplot(outlier.shape = NA, varwidth = TRUE, alpha = 0.7, colour = "grey40")+     # Boxplot with circles for outliers and width proportional to count
  scale_y_log10(breaks = major_spacing, minor_breaks = minor_spacing, labels = number) + # Log scale for Y axis
  geom_jitter(aes(fill = Analyte), shape = 21, size = 2.5, alpha = 0.3, width = 0.1)+                          # overlay data points to show actual distribution and clustering
  geom_point(aes(Analyte,GIL_fresh), colour="red", shape=6, size = 3)+                                 # Choose the HIL set to apply
  geom_point(aes(Analyte,ADWG), colour="red", shape=4, size = 3)+
  geom_point(aes(Analyte,HSLAB_sand_2-4), colour="red", shape=3, size = 3)+
  labs(title = "Box Plots", subtitle = "Box width is proportional to the square root of the number of samples.  Individual data points overlaid as circles.\nGILs shown as red triangless.ADWG values shown as red Xs. HSLs shown as red +s.") +
  ylab("Concentration (\u03BCg/L)") +                                                        # Label for Y axis
  xlab("") +                                                                             # X axis already done
  scale_color_viridis(discrete = TRUE, option = "viridis")+                              # Colour-blind friendly outlines
  scale_fill_viridis(discrete = TRUE, option ="viridis") +                               # Colour-blind friendly fill
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5), panel.grid.major.y = element_line(size = 0.5))+
  theme(strip.background = element_rect(colour = "black", fill = "white"),               # White label strips, black text and border
        strip.text.x = element_text(colour = "black", face = "bold"),
        panel.border = element_rect(colour = "black", fill = NA),
        axis.title = element_text(colour = "black", face = "bold"),
        axis.text = element_text(colour = "black", face = "bold")
  )

The legend shows, for each analyte, an entry for eack of the geom_* functions in the ggplot call, superimposed over eachother. I would like to separate these out so the legend entry for geom_boxplot is distinct from the legend entry for each of the geom_point entries so that I can label what the triangle stands for, and what the X stands for.

I'm reading the data in from a spreadsheet and am not sure how to set up dummy data in code but sample data is here:

Analyte Concentration GIL_fresh GIL_marine  ADWG HSLAB_sand_2_4 HSLAB_sand_4_8 HSLAB_sand_8 HSLC_sand_2_4 HSLC_sand_4_8 HSLC_sand_8 HSLD_sand_2_4 HSLD_sand_4_8 HSLD_sand_8 HSLAB_silt_2_4 HSLAB_silt_4_8
   <fct>           <dbl>     <dbl>      <dbl> <dbl>          <dbl>          <dbl>        <dbl> <lgl>         <lgl>         <lgl>               <dbl>         <dbl>       <dbl>          <dbl>          <dbl>
 1 Arsenic          12       13          NA      10             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 2 Cadmium           1        0.2         0.7     2             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 3 Chromi…          24        1           4.4    50             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 4 Copper           42        1.4         1.3  2000             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 5 Lead             24        3.4         4.4    10             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 6 Mercury           0.1      0.06        0.1     1             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 7 Nickel            8       11           7      20             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 8 Zinc            100        8          15      NA             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
 9 Ammonia        2252       NA          NA      NA             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
10 Arsenic          10       13          NA      10             NA             NA           NA NA            NA            NA                     NA            NA          NA             NA             NA
1
It's easier to help you if you include a simple reproducible example with sample input that can be used to test and verify possible solutions.MrFlick
I'll add somethingtim
I've updated the question. Hopefully more helpfultim
I think you will need to do some pivoting of the data so you can use a aes(shape = ) mapping and then add a scale_shape_manual. It will be hard for us to actually show you because you provided such little data and also no copy of major_spacing and minor_spacing.Ian Campbell

1 Answers

2
votes

I try to create some data that is like yours, i suspect for the columns with the shapes, such as GIL_fresh, were mostly likely obtained from some merge. Might be better to plot them with a separate data.frame:

analytes = c("Ammonia", "Arsenic", "Cadmium", "Chromium", "Copper", "Lead", "Mercury", "Nickel", "Zinc")
PlotData_HIL = data.frame(
                Analyte = rep(analytes,each=5),
                Concentration = runif(45,0,100),
                GIL_fresh  = rep(c(10,rep(NA,8)),5),
                ADWG = rep(c(15,rep(NA,8)),5))

For the red dots, you need to specify shape= inside the aes(..) so that the legend for shape will appear, and lastly, I take away the dots for your fill legend because it looks quite redundant:

PlotData_HIL %>%
ggplot(aes(Analyte, Concentration, fill = Analyte)) +                                  
geom_boxplot(outlier.shape = NA, varwidth = TRUE, alpha = 0.7, colour = "grey40")+     
scale_y_log10() + 
geom_jitter(aes(fill = Analyte), shape = 21, size = 2.5, alpha = 0.3, width = 0.1)+                         
geom_point(aes(Analyte,GIL_fresh,shape="GIL_fresh"), colour="red", size = 3)+                                 
geom_point(aes(Analyte,ADWG,shape="ADWG"), colour="red", size = 3) +
scale_shape_manual(values=c(4,6))+
scale_color_viridis(discrete = TRUE, option = "viridis")+                              
scale_fill_viridis(discrete = TRUE, option ="viridis") +
guides(fill = guide_legend(override.aes = list(shape = NA) ))

enter image description here