5
votes

I'm trying to produce a variation of a grouped violin plot in R (preferably using ggplot2), similar to the one below:

Grouped Violin Plot

which was produced by the following reproducible example code:

# Load libraries #
library(tidyverse)

# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100), 
                 Y = rgamma(n = 200, shape = 2, rate = 2),
                 Z = rep(c("Za", "Zb"), rep = 100),
                 stringsAsFactors = FALSE)

# Grouped violin plot #
df %>% 
  ggplot(., aes(x = X, y = Y, fill = Z)) + 
    geom_violin(draw_quantiles = 0.5) + 
    scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))

The variation I'd like to have is that the density above the median should have a different shade compared to the density below the median, as in the following plot:

Shaded Violin Plot

I produced the above (single) violin plot for the combination X = X1 and Z = Za in the data, using the following code:

## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>% 
            filter(X == "X1", Z == "Za") %>% 
            summarise(Y_min = min(Y),
                      Y_qnt = quantile(Y, 0.5),
                      Y_max = max(Y))

# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>% 
            filter(X == "X1", Z == "Za") %>% 
            do(data.frame(LOC  = density(.$Y)$x,
                          DENS = density(.$Y)$y)) %>%
            filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>% 
            mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))

# Find density values at limits #
df.lim.2 <- df.dens %>% 
              filter(LOC == min(LOC) | LOC == max(LOC))

# Produce shaded single violin plot #
df.dens %>% 
  ggplot(aes(x = LOC)) + 
    geom_area(aes(y =  DENS, alpha = COL), fill = "red") +
    geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
    geom_path(aes(y =  DENS)) +
    geom_path(aes(y = -DENS)) +
    geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
    coord_flip() + 
    scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))

As you will notice in the code, I'm building the violin plot from scratch using the density function horizontally and then flipping the axes. The problem arises when I try to produce a grouped violin plot mainly because the axis in which the groups X and Z will appear, is already used for the "height" of the density. I did try to reach the same result by repeating all the calculations by groups but I'm stuck in the final step:

## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>% 
            group_by(X, Z) %>% 
            summarise(Y_min = min(Y),
                      Y_qnt = quantile(Y, 0.5),
                      Y_max = max(Y))

# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>% 
            group_by(X, Z) %>% 
            do(data.frame(LOC  = density(.$Y)$x,
                          DENS = density(.$Y)$y)) %>%
            left_join(., df.lim, by = c("X", "Z")) %>% 
            filter(LOC >= Y_min, LOC <= Y_max) %>% 
            mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))

# Find density values at limits by group #
df.lim.2 <- df.dens %>%
              group_by(X, Z) %>% 
              filter(LOC == min(LOC) | LOC == max(LOC))

# Produce shaded grouped violin plot #
df.dens %>% 
  ggplot(aes(x = LOC, group = interaction(X, Z))) + 
    # The following two lines don't work when included #
    #geom_area(aes(y =  DENS, alpha = COL), fill = "red") +
    #geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
    geom_path(aes(y =  DENS)) +
    geom_path(aes(y = -DENS)) +
    geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
    coord_flip() + 
    scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))

Running the code above will produce the outline of the violin plots for each group, each one on top of the other. But once I try to include the geom_area lines, the code fails.

My gut feeling tells me that I would need to somehow produce the "shaded" violin plot as a new geom which can then be used under the general structure of ggplot2 graphics but I have no idea how to do that, as my coding skills don't extend that far. Any help or pointers, either along my line of thought or in a different direction would be much appreciated. Thank you for your time.

1
I don't think geom_area() is going to solve your problem when the violins are anywhere else than near 0. It would probably be better to replace it with geom_polygon(). The best guide I've found to creating your own geoms and such is here: cran.r-project.org/web/packages/ggplot2/vignettes/….teunbrand

1 Answers

2
votes

Idea

For the fun of it, I hacked a quick half-violin geom. It is basically a lot of copy & paste from GeomViolin and in order to make it run I had to access some of the internal ggplot2 function, which are not exported via ::: which means that this solution may not run in the future (if the ggplot team decides to change their internal functions).

However, this solution works and you can specify the alpha level of both the upper and the lower part. The geom assumes that you are providing just one quantile. The code is only superficially tested but it gives you an idea of how this can be done. As said it is in large part a simple copy & paste from GeomViolin where I added some code which finds out which values are below and above the quantile and splits the underlying GeomPolygon in 2 parts, as this function uses just a single alpha value. It works with groups and coord_flip likewise.


Code

library(grid)

GeomHalfViolin <- ggproto("GeomHalfViolin", GeomViolin,
  draw_group = function (self, data, ..., draw_quantiles = NULL, 
                         alpha_upper = .5, alpha_lower = 1) {
    data <- transform(data, xminv = x - violinwidth * (x - xmin), 
        xmaxv = x + violinwidth * (xmax - x))
    newdata <- rbind(transform(data, x = xminv)[order(data$y), 
        ], transform(data, x = xmaxv)[order(data$y, decreasing = TRUE), 
        ])
    newdata <- rbind(newdata, newdata[1, ])
    if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
        stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 
            1))
        stopifnot(length(draw_quantiles) <= 1)
        ## need to add ggplot::: to access ggplot2 internal functions here and there
        quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
        ###------------------------------------------------
        ## find out where the quantile is supposed to be
        quantile_line <- unique(quantiles$y)
        ## which y values are below this quantile?
        ind <- newdata$y <= quantile_line
        ## set the alpha values accordingly
        newdata$alpha[!ind] <- alpha_upper
        newdata$alpha[ind] <- alpha_lower
        ###------------------------------------------------
        aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), 
            c("x", "y", "group")), drop = FALSE]
        aesthetics$alpha <- rep(1, nrow(quantiles))
        both <- cbind(quantiles, aesthetics)
        both <- both[!is.na(both$group), , drop = FALSE]
        quantile_grob <- if (nrow(both) == 0) {
            zeroGrob()
        }
        else {
            GeomPath$draw_panel(both, ...)
        }
        ###------------------------------------------------
        ## GeomPolygon uses a single alpha value by default
        ## Hence, split the violin in two parts
        ggplot2:::ggname("geom_half_violin",
                         grobTree(GeomPolygon$draw_panel(newdata[ind, ], ...),
                                  GeomPolygon$draw_panel(newdata[!ind, ], ...),
                                  quantile_grob))
        ###------------------------------------------------
    }
    else {
        ggplot2:::ggname("geom_half_violin", GeomPolygon$draw_panel(newdata, 
            ...))
    }
  } 
)

geom_half_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
                             position = "dodge", ..., draw_quantiles = NULL,
                             alpha_upper = .5, alpha_lower = 1, 
                             trim = TRUE, scale = "area", 
                             na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
    layer(data = data, mapping = mapping, stat = stat, geom = GeomHalfViolin, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, 
                      alpha_lower = alpha_lower, alpha_upper = alpha_upper,
                      na.rm = na.rm, ...))

}


library(tidyverse)

# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100), 
                 Y = rgamma(n = 200, shape = 2, rate = 2),
                 Z = rep(c("Za", "Zb"), rep = 100),
                 stringsAsFactors = FALSE)

# Grouped violin plot #
df %>% 
  ggplot(., aes(x = X, y = Y, fill = Z)) + 
    geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1) + 
    scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
# no groups
df %>% filter(Z == "Za") %>% 
  ggplot(., aes(x = X, y = Y)) + 
    geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1, fill = "red") + 
    scale_fill_manual(values = c("Za" = "red", "Zb" = "blue")) + 
    coord_flip()

Graphs

Grouped Half-Violin Plot Flipped Half-Violin Plot