I want to put a kernel density plot for three categories in one single plot and vertical lines to indicate the mean and median of the whole distribution. The plot is rather easy using ggplot2
, but I've been struggling to get the legend right and it's been very very hard.
set.seed(1234)
data <- data.frame(value = rgamma(n = 10000, shape = 3, scale = 1),
type = sample(letters[1:3], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] +
rnorm(sum(data$type == "b"), mean = 2)
data$value[data$type == "c"] <- data$value[data$type == "c"] +
rnorm(sum(data$type == "c"), mean = 4)
# Let's produce a 'coloured' AND 'filled' density plot
# ('cause I want both the area under the curve and the line to be coloured)
library(ggplot2)
gp <- ggplot(data=data, aes_string(x="value"))
gp <- gp + geom_density(aes_string(fill="type", colour="type"), alpha=0.3)
gp
This gives me the plot and the legend without problem.
Now, I add the vertical lines and the plot is fine, but the legend is really ugly.
# Now let's add vertical lines to the plot, indicating the mean
# and median for the whole distribution
vlines <- data.frame(mean_median = c(mean(data$value), median(data$value)),
labels = c("Mean", "Median"))
gp <- gp + geom_vline(data=vlines,
aes(xintercept=mean_median, colour=labels),
size=1.05, linetype="dashed", show_guide=TRUE)
gp
The legend is all mixed-up with the 'coloured' vlines
and the 'coloured' and 'filled' density. I indeed want two legends, in one of them the legend for the vlines (two entries) and in the other want the legend for the colours of the density plot.
One workaround is to get the geom_density
either coloured or filled, but not both. It works better, but it is not what I want (because I want both, the area and the line of the density plot to have the colour). It goes like this. Instead of doing this:
gp <- gp + geom_density(aes_string(fill="type", colour="type"), alpha=0.3)
I do this:
gp <- gp + geom_density(aes_string(fill="type"), alpha=0.3)
And the result is ALMOST what I want, but I really want both the area under the curve and the line of the density plot to be coloured.
I've been trying to find a solution all over the web and nothing seems to work. These kind of problems sometimes are tackled using the show_guide
of the geoms, because you can override whether or not to plot a legend for individual geoms. I've played with it and it does not work in my case because I need legend from both geoms (density and vlines), and the problem is for having both fill and colour in one of those geoms, plus the colour in the other.
There was a very similar problem posted here in Stack Overflow (bar and line plot in one chart with a legend under ggplot2), but the solution used there (using subset) does not apply in my case.
I'd really appreciate any ideas. I've been struggling with this for quite some time now and I cannot find a solution.
ggplot(data=data, aes(x=value)) + geom_density(aes(fill=type), alpha=0.3, colour=NA ) + geom_vline(data=vlines, aes(xintercept=mean_median, colour=labels), linetype="dashed", size=1.5, show_guide=TRUE ) + guides(fill = guide_legend(override.aes = list(linetype = 0 )))
. – user20650