4
votes

I want to put a kernel density plot for three categories in one single plot and vertical lines to indicate the mean and median of the whole distribution. The plot is rather easy using ggplot2, but I've been struggling to get the legend right and it's been very very hard.

set.seed(1234)
data <- data.frame(value = rgamma(n = 10000, shape = 3, scale = 1),
               type = sample(letters[1:3], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] +
                          rnorm(sum(data$type == "b"), mean = 2)
data$value[data$type == "c"] <- data$value[data$type == "c"] +
                          rnorm(sum(data$type == "c"), mean = 4)

# Let's produce a 'coloured' AND 'filled' density plot
# ('cause I want both the area under the curve and the line to be coloured)
library(ggplot2)
gp <- ggplot(data=data, aes_string(x="value"))
gp <- gp + geom_density(aes_string(fill="type", colour="type"), alpha=0.3)
gp

This gives me the plot and the legend without problem.

plot1

Now, I add the vertical lines and the plot is fine, but the legend is really ugly.

# Now let's add vertical lines to the plot, indicating the mean 
# and median for the whole distribution
vlines <- data.frame(mean_median = c(mean(data$value), median(data$value)),
                 labels = c("Mean", "Median"))

gp <- gp + geom_vline(data=vlines,
                  aes(xintercept=mean_median, colour=labels),
                  size=1.05, linetype="dashed", show_guide=TRUE)
gp

plot2

The legend is all mixed-up with the 'coloured' vlines and the 'coloured' and 'filled' density. I indeed want two legends, in one of them the legend for the vlines (two entries) and in the other want the legend for the colours of the density plot.

One workaround is to get the geom_density either coloured or filled, but not both. It works better, but it is not what I want (because I want both, the area and the line of the density plot to have the colour). It goes like this. Instead of doing this:

gp <- gp + geom_density(aes_string(fill="type", colour="type"), alpha=0.3)

I do this:

gp <- gp + geom_density(aes_string(fill="type"), alpha=0.3)

plot3

And the result is ALMOST what I want, but I really want both the area under the curve and the line of the density plot to be coloured.

I've been trying to find a solution all over the web and nothing seems to work. These kind of problems sometimes are tackled using the show_guide of the geoms, because you can override whether or not to plot a legend for individual geoms. I've played with it and it does not work in my case because I need legend from both geoms (density and vlines), and the problem is for having both fill and colour in one of those geoms, plus the colour in the other.

There was a very similar problem posted here in Stack Overflow (bar and line plot in one chart with a legend under ggplot2), but the solution used there (using subset) does not apply in my case.

I'd really appreciate any ideas. I've been struggling with this for quite some time now and I cannot find a solution.

1
(I think) the difficulty is that you are trying to map different values in different geoms to the same aesthetic - colour. Hence why the third plot is ok(ish) as you have removed colour from the geom_density aesthetic. To tidy up the legend of this third plot and remove the black border on the densities you can try ggplot(data=data, aes(x=value)) + geom_density(aes(fill=type), alpha=0.3, colour=NA ) + geom_vline(data=vlines, aes(xintercept=mean_median, colour=labels), linetype="dashed", size=1.5, show_guide=TRUE ) + guides(fill = guide_legend(override.aes = list(linetype = 0 ))).user20650

1 Answers

4
votes

More than likely an easier way to do this but you could hack the plot object, as an alternative to comment above. You can do two plots one with the plot as you like and one with the legend as you like and swap over the legends.

# Plot with density area and line coloured but legend not right
p1 <- ggplot(data=data, aes(x=value)) + 
            geom_density(aes(fill=type, colour=type), alpha=0.3 ) + 
            geom_vline(data=vlines, aes(xintercept=mean_median, colour=labels), 
                         linetype="dashed", size=1.5, show_guide=TRUE ) 

g1 <- ggplotGrob(p1)

# Plot with density line not coloured but legend is ok
p2 <- ggplot(data=data, aes(x=value)) + 
            geom_density(aes(fill=type), alpha=0.3 ) + 
            geom_vline(data=vlines, aes(xintercept=mean_median, colour=labels), 
                         linetype="dashed", size=1.5, show_guide=TRUE )  +
            guides(fill = guide_legend(override.aes = list(linetype = 0 ))) 

g2 <- ggplotGrob(p2)


# Add legend of second plot to first plot    
g1$grobs[which(g1$layout$name=="guide-box")] <- 
                              g2$grobs[which(g2$layout$name=="guide-box")]  

grid::grid.newpage()    
grid::grid.draw(g1)

enter image description here