1
votes

Using the iris dataset as an example, I would like to draw a boxplot (for setosa species only) with Sepal.Length on the x-axis and Petal.Length on the y-axis. However, this first requires binning of continuous Sepal.Length data for the x-axis into groups: Sepal.Length < 4.7, Sepal.Length 4.7 - 5, Sepal.Length 5 - 5.2 and Sepal.Length > 5.2. Second, it requires grouping together the first and third group. I tried the code below, but this does not work. Any suggestions would be appreciated. Thank you.

library(ggplot2)
bin1 <- iris[iris$Sepal.Length < 4.7, ]
bin2 <- iris[iris$Sepal.Length >=4.7 & <5, ]
bin3 <- iris[iris$Sepal.Length >=5 & <5.2, ]
bin4 <- iris[iris$Sepal.Length >=5.2, ]
binA <- bin1 + bin3
order <- c(bin2, binA, bin4)
ggboxplot(iris[iris$Species == "setosa", ], x="Sepal.Length", y="Petal.Length") + scale_x_discrete(limits=order)
1

1 Answers

1
votes

I would use the cut function to do what you are doing. Afterwards, you could then use the fct_collapse to modify your cut points. You could do something like the following:

library(dplyr)
library(forcats)
library(ggplot2)

iris %>% 
  filter(Species == "setosa") %>% 
  mutate(sub_species = cut(Sepal.Length, breaks = c(-Inf, 4.7, 5, 5.2, Inf))) %>% 
  mutate(sub_species = fct_collapse(sub_species,
                                    combined = c("(-Inf,4.7]", "(5.2, Inf]"))) %>% 
  ggplot(aes(sub_species, Petal.Length))+
  geom_boxplot()

And that will give you want you want.

Alternatively, you could replace the cut function and use dplyr's case when function that would look like:

iris %>% 
  filter(Species == "setosa") %>% 
  # Case when to cases
  mutate(sub_a = case_when( Sepal.Length < 4.7~"A",
                            Sepal.Length < 5~ "B",
                            Sepal.Length < 5.2~ "C",
                            TRUE~"D")) %>% 
  # Collapse A and D
  mutate(collapsed = ifelse(sub_a %in% c("A", "D"), "combined", sub_a)) %>% 
  ggplot(aes(collapsed, Petal.Length))+
  geom_boxplot()

In OP comment the question was expanded to include creation of several other sub-classes. To solve this I will use the mutate function to make a few extra sub-categories, then use the gather function to pull them all into a single column while preserving the data within each subclass (e.g. keeping the counts correct).

iris %>% 
  filter(Species == "setosa") %>% 
  # Case when to cases
  mutate(sub_a = case_when( Sepal.Length < 4.7~"A",
                            Sepal.Length < 5~ "B",
                            Sepal.Length < 5.2~ "C",
                            TRUE~"D")) %>% 
  # Collapse A and D
  mutate(collapsed1 = ifelse(sub_a %in% c("A", "C"), "A+C", sub_a)) %>% 
  mutate(collapsed2 = ifelse(sub_a %in% c("A", "C", "D"), "A+C+D", sub_a)) %>% 
  # Pull all the new categories together into a new column called subclass
  gather(new_cat, subclass, sub_a:collapsed2) %>% 
  # Filter to desired
  filter(subclass %in% c("B", "A+C", "D", "A+C+D")) %>% 
  ggplot(aes(subclass, Petal.Length))+
  geom_boxplot()