I'm trying to produce a variation of a grouped violin plot in R (preferably using ggplot2
), similar to the one below:
which was produced by the following reproducible example code:
# Load libraries #
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_violin(draw_quantiles = 0.5) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
The variation I'd like to have is that the density above the median should have a different shade compared to the density below the median, as in the following plot:
I produced the above (single) violin plot for the combination X = X1
and Z = Za
in the data, using the following code:
## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>%
filter(X == "X1", Z == "Za") %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>%
filter(X == "X1", Z == "Za") %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>%
mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))
# Find density values at limits #
df.lim.2 <- df.dens %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded single violin plot #
df.dens %>%
ggplot(aes(x = LOC)) +
geom_area(aes(y = DENS, alpha = COL), fill = "red") +
geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
As you will notice in the code, I'm building the violin plot from scratch using the density
function horizontally and then flipping the axes. The problem arises when I try to produce a grouped violin plot mainly because the axis in which the groups X
and Z
will appear, is already used for the "height" of the density. I did try to reach the same result by repeating all the calculations by groups but I'm stuck in the final step:
## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>%
group_by(X, Z) %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>%
group_by(X, Z) %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
left_join(., df.lim, by = c("X", "Z")) %>%
filter(LOC >= Y_min, LOC <= Y_max) %>%
mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))
# Find density values at limits by group #
df.lim.2 <- df.dens %>%
group_by(X, Z) %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded grouped violin plot #
df.dens %>%
ggplot(aes(x = LOC, group = interaction(X, Z))) +
# The following two lines don't work when included #
#geom_area(aes(y = DENS, alpha = COL), fill = "red") +
#geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
Running the code above will produce the outline of the violin plots for each group, each one on top of the other. But once I try to include the geom_area
lines, the code fails.
My gut feeling tells me that I would need to somehow produce the "shaded" violin plot as a new geom
which can then be used under the general structure of ggplot2
graphics but I have no idea how to do that, as my coding skills don't extend that far. Any help or pointers, either along my line of thought or in a different direction would be much appreciated. Thank you for your time.
geom_area()
is going to solve your problem when the violins are anywhere else than near 0. It would probably be better to replace it withgeom_polygon()
. The best guide I've found to creating your own geoms and such is here: cran.r-project.org/web/packages/ggplot2/vignettes/…. – teunbrand