1
votes

I am struggling with the following code using the iris dataset. I would like to draw a violin plot, only including setosa species, and do some complex overlapping combinations on subgroups of data.

Specifically, on the x-axis, I would like to first divide continuous Sepal.Length data into groups: group A=Sepal.Length < 4.7, group B=Sepal.Length 4.7 - 5, group C=Sepal.Length 5 - 5.2 and group D=Sepal.Length > 5.2.

Then, I would like to plot four violins/boxes on the x-axis, single/overlapping groups: "B", "A+C", "D", "A+C+D". The y-axis is simply "Petal.Length".

I am also including code to show the sample size (n) for each violin.

I would appreciate any suggestions. Thank you.

library(dplyr)
library(ggplot2)
library(ggpubr)
# Define order of violins on x-axis.
order <- c("B", "AC", "D", "ACD")
# Function to display sample size (n) for each violin.
give.n <- function(x){return(c(y = min(Petal.Length), label = length(x)))}
iris %>% 
  filter(Species == "setosa") %>% 
  mutate(sub_a = case_when( Sepal.Length < 4.7~"A",
                        Sepal.Length < 5~ "B",
                        Sepal.Length < 5.2~ "C",
                        TRUE~"D")) %>% 
  mutate(collapsed = c((ifelse(sub_a %in% c("A", "C"), "AC", sub_a)), (ifelse(sub_a %in% c("AC", "D"), "ACD", sub_a)))) %>% 
  ggviolin(iris[iris$Species == "setosa", ], x=collapsed, y=Petal.Length) + scale_x_discrete(limits=order) + stat_summary(fun.data = give.n, geom = "text")

Edit

See below for expected result. Note that the numbers below each violin are accurate. The rest of the image is only an example of expected outcome.

enter image description here

1
Can you add the current plot and draw the expected output in Paint/Word/PPT?Tung
@Tung - Thank you for your question. Please see edit above.Sylvia Rodriguez

1 Answers

3
votes

I cannot see how to do this as a single chain, but here is a brute-force solution that uses cut and then bind_rows

setosa <- iris %>% filter(Species == "setosa")  %>% 
  mutate(group = cut(Sepal.Length, breaks = c(0, 4.7, 5, 5.2, Inf), labels = c("A", "B", "C", "D"), right = FALSE))

bind_rows(B = setosa %>% filter(group == "B"),
          AC =  setosa %>% filter(group %in% c("A", "C")),
          D =  setosa %>% filter(group == "D"),
          ACD = setosa %>% filter(group %in% c("A", "C", "D")),
          .id = "group2"
          ) %>% 
  mutate(group2 = factor(group2, levels = c("B", "AC", "D", "ACD"))) %>% 
  ggplot(aes(x = group2, y = Petal.Length)) + 
  geom_violin()