4
votes

I'll use violin plots here as an example, but the question extends to many other ggplot types.

I know how to subset my data along the x-axis by a factor:

ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_violin() +
  geom_point(position = "jitter")

violin plot by species

And I know how to plot only the full dataset:

ggplot(iris, aes(x = 1, y = Sepal.Length)) +
  geom_violin() +
  geom_point(position = "jitter")

violin plot full data

My question is: is there a way to plot the full data AND a subset-by-factor side-by-side in the same plot? In other words, for the iris data, could I make a violin plot that has both "full data" and "setosa" along the x-axis?

This would enable a comparison of the distribution of a full dataset and a subset of that dataset. If this isn't possible, any recommendations on better way to visualise this would also be welcome :)

Thanks for any ideas!

1

1 Answers

8
votes

Using:

ggplot(iris, aes(x = "All", y = Sepal.Length)) +
  geom_violin() +
  geom_point(aes(color="All"), position = "jitter") +
  geom_violin(data=iris, aes(x = Species, y = Sepal.Length)) +
  geom_point(data=iris, aes(x = Species, y = Sepal.Length, color = Species), 
             position = "jitter") +
  scale_color_manual(values = c("black","#F8766D","#00BA38","#619CFF")) +
  theme_minimal(base_size = 16) +
  theme(axis.title.x = element_blank(), legend.title = element_blank())

gives:

enter image description here