Question: Is it possible to add a legend to a plot that has nothing to do with the plot itself and - crucially - will not interfere with the colors in the plot?
Explanation
I have all the information I should need for the legend. In particular, I have the hex codes of the colors and I have the labels. I do not care what shapes are shown (lines, points, whichever is easiest).
I was hoping this should do the trick (this is a very simplified minimal working example):
the_colors <- c("#e6194b", "#3cb44b", "#ffe119", "#0082c8", "#f58231", "#911eb4", "#46f0f0", "#f032e6",
"#d2f53c", "#fabebe", "#008080", "#e6beff", "#aa6e28", "#fffac8", "#800000", "#aaffc3",
"#808000", "#ffd8b1", "#000080", "#808080", "#ffffff", "#000000")
the_labels <- c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10")
the_df <- data.frame("col1"=c(1, 2, 2, 1), "col2"=c(2, 2, 1, 1), "col3"=c(1, 2, 3, 4))
the_plot <- ggplot() + geom_point(data=the_df, aes(x=col1, y=col2), color=the_colors[[4]])
the_plot <- the_plot +
scale_color_manual("Line.Color", values=the_colors[1:length(the_labels)],
labels=the_labels)
Unfortunately, it will not even show the legend.
Playing by the rules, and including the color
argument inside the aesthetics
element, I can get it to show a legend.
the_plot <- ggplot() + geom_point(data=the_df, aes(x=col1, y=col2, color=the_colors[[4]]))
But then, of course, it will not take the value passed as the color
argument () serious any longer and will instead interpret it as some kind of a label and change the color of these data points to the first color in the the_colors
list. At the same time, it will only include this one in the legend and there does not seem to be a way in hell to convince it also include the others.
In other languages, this is unbelievably easy. In R/ggplot2, this seems unbelievably hard.
Reason why I want to do this: I want a legend that does not interfere with the colors in my plot. This is sometimes very inconvenient. There is also no deeper reason that the legend must mess with the colors in the plot, just that this is how it is implemented in R/ggplot2.
Approach: I was hoping that there is a way to easily do this by still treating this as a legend. Failing that, it might be possible to add a box with some colored points and some text, thereby constructing a legend from scratch.
Other questions: There have been various other questions asking the same thing. The answers did instead suggest workarounds to solve the concrete problem of the OP (usually by applying melt()
or so) without providing a solution to the question that was asked (how to add a legend manually without messing with the plot). E.g. here and here. This is not what I am interested in. I would like to know if I can add an arbitrary legend to an arbitrary plot, and, if yes, how.
Software: R 3.6.3, ggplot2 3.2.1
Edit (March 30 2020):
Solution: As described in @Tjebo's answer below, a legend that is reasonably independent from the plot and defines additional data series not shown in the plot can be created with scale_color_identity
. With option #1 in the answer by @Tjebo, I could solve my immediate problem:
the_colors <- sort(c("#e6194b", "#3cb44b", "#ffe119", "#0082c8", "#f58231", "#911eb4", "#46f0f0", "#f032e6",
"#d2f53c", "#fabebe", "#008080", "#e6beff", "#aa6e28", "#fffac8", "#800000", "#aaffc3",
"#808000", "#ffd8b1", "#000080", "#808080"))
color_df <- data.frame(the_colors=the_colors[1:length(the_labels)], the_labels=the_labels)
the_df <- data.frame("col1"=c(1, 2, 2, 1), "col2"=c(2, 2, 1, 1), "col3"=c(1, 2, 3, 4))
the_plot <- ggplot() +
geom_point(data = color_df, aes(x = the_df$col1[[1]], y = the_df$col2[[1]], color = the_colors)) +
scale_color_identity(guide = 'legend', labels = color_df$the_labels)
the_plot <- the_plot +
geom_point(data=the_df, aes(x=col1, y=col2), color=the_colors[[4]])
print(the_plot)
Explanation of the solution: More generally, as Tjebo explains, it separates the plot from the legend. The legend still needs a plot. This is built first with:
the_plot <- ggplot() +
geom_point(data = color_df, aes(x = the_df$col1[[1]], y = the_df$col2[[1]], color = the_colors)) +
scale_color_identity(guide = 'legend', labels = color_df$the_labels)
The plot this creates still has the wrong color, but the points are chosen so that they are hidden by adding the plot I actually want to show in its appropriate color:
the_plot <- the_plot +
geom_point(data=the_df, aes(x=col1, y=col2), color=the_colors[[4]])
It is also flexible in that further data series can be added in any of the colors that are predefined in the the_colors
variable:
the_plot <- the_plot +
geom_point(data=the_df, aes(x=col1, y=col3), color=the_colors[[6]])
(Note: The data series can also be plotted at once if the color is defined as a third column in the data frame. I just wanted to point out that the solution is flexible and the plot can be modified at a later time without interfering with the legend or the colors of the data points that are already arranged in the plot.)
Edit 2 (March 30 2020), Additional Note: With this solution, the legend will sort the colors by their hex codes. I cannot begin to fathom why it would do that, but it does. So, in order for the colors in the legend to match the intended colors, the vector of hex codes should be sorted beforehand (as is done in the code above).
Unexpected behaviors like this would not be a concern in normal use of R and ggplot2 (where you let ggplot2 do the legend for you and restrict yourself strictly to what designs are intended to be used). This solution is basically a hack around how the legend is expected to be used in ggplot2 (unfortunately quite restrictive). As such, it is possible that this hack will break in future versions of ggplot or R.
ggplot2
though I did not include that in the example. – 0range