5
votes

How can I fill a geom_violin plot in ggplot2 with different colors based on a fixed cutoff?

For instance, given the setup:

library(ggplot2)

set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
                  y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))
dat$f <- with(dat,ifelse(y >= 0,'Above','Below'))

I'd like to take this basic plot:

ggplot() + 
    geom_violin(data = dat,aes(x = factor(x),y = y))

and simply have each violin colored differently above and below zero. The naive thing to try, mapping the fill aesthetic, splits and dodges the violin plots:

ggplot() + 
    geom_violin(data = dat,aes(x = factor(x),y = y, fill = f))

which is not what I want. I'd like a single violin plot at each x value, but with the interior filled with different colors above and below zero.

1
Do people do this here? Find something they think is new... ask a question for it, and answer it... all for those sweet sweet internet points?cory
@cory Um...yes, yes it is. Being rude and condescending in comments, however, is generally frowned upon, just so you know. (We all do it anyway, though.)joran
Cool, I didn't intend to be rude. I just didn't know. I respect your sweet sweet internet points. I really do. I've upvoted you, so you get even more.cory

1 Answers

9
votes

Here's one way to do this.

library(ggplot2)
library(plyr)

#Data setup
set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
                  y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))

First we'll use ggplot::ggplot_build to capture all the calculated variables that go into plotting the violin plot:

p <- ggplot() + 
    geom_violin(data = dat,aes(x = factor(x),y = y))
p_build <- ggplot2::ggplot_build(p)$data[[1]]

Next, if we take a look at the source code for geom_violin we see that it does some specific transformations of this computed data frame before handing it off to geom_polygon to draw the actual outlines of the violin regions.

So we'll mimic that process and simply draw the filled polygons manually:

#This comes directly from the source of geom_violin
p_build <- transform(p_build,
                     xminv = x - violinwidth * (x - xmin),
                     xmaxv = x + violinwidth * (xmax - x))

p_build <- rbind(plyr::arrange(transform(p_build, x = xminv), y),
                 plyr::arrange(transform(p_build, x = xmaxv), -y))

I'm omitting a small detail from the source code about duplicating the first row in order to ensure that the polygon is closed.

Now we do two final modifications:

#Add our fill variable
p_build$fill_group <- ifelse(p_build$y >= 0,'Above','Below')
#This is necessary to ensure that instead of trying to draw
# 3 polygons, we're telling ggplot to draw six polygons
p_build$group1 <- with(p_build,interaction(factor(group),factor(fill_group)))

And finally plot:

#Note the use of the group aesthetic here with our computed version,
# group1
p_fill <- ggplot() + 
    geom_polygon(data = p_build,
                 aes(x = x,y = y,group = group1,fill = fill_group))
p_fill

enter image description here

Note that in general, this will clobber nice handling of any categorical x axis labels. So you will often need to do the plot using a continuous x axis and then if you need categorical labels, add them manually.