2
votes

My problem is that I want to reorder the factors in my ggplot output, produced using geom_bar(position = "fill"), so that the highest proportion of the positive class is closest to the y-axis. I have managed to find a working solution, but from my research it seems like there is a more efficient solution lurking, although I cannot seem to find it.

I have read through the question Order Bars in ggplot2 bar graph but I can't seem to find a solution for ordering by proportions, that is, by values that are not explicitly present in the data frame, but are a summary stat.

I have had a look at the Modifying Factor Order section of the book, R for Data Science and a solution is proposed whereby a summary data frame is produced with a "prop" column and a line graph created from these values using fct_reorder2(). However, I can't seem to apply similar logic to a "filled" bar graph.

The solution I finally stumbled upon came from this source #267 REORDER A VARIABLE IN GGPLOT2, whereby you just set the new factor levels using mutate(). However, instead of just defining the order myself, I created a data frame that ordered the factor by the proportion of positive class.

What I'd like to know is if there is a more efficient way to do this, perhaps in one long pipe operation?

Here's a reproducible example:

library(ggplot2)
library(dplyr)

variable <- c(rep("alpha", 4),
              rep("beta", 4),
              rep("gamma", 4),
              rep("delta", 4))

class <- c(rep("1", 4),
           "1", "1", "0", "0",
           rep("0", 3), "1",
           rep("1", 3), "0")

dframe <- data.frame(variable, class)

plot_order <- dframe %>%
  count(variable, class) %>%
  group_by(variable) %>%
  mutate(prop = prop.table(n)) %>%
  filter(class == "1") %>%
  arrange(prop)

lvls <- as.character(plot_order$variable)

dframe %>%
  mutate(variable = factor(variable, levels = lvls)) %>%
  ggplot(aes(x = variable, fill = class)) +
  geom_bar(position ="fill") +
  labs(y = "Proportion")

Here's the output of plot_order:

# A tibble: 4 x 4
# Groups:   variable [4]
  variable class     n  prop
  <fct>    <fct> <int> <dbl>
1 alpha    1         4  1   
2 delta    1         3  0.75
3 beta     1         2  0.5 
4 gamma    1         1  0.25

And the outcome:

Bar graph with ordered factors based on position "fill"

Bar graph with ordered factors based on position "fill"

Thanks in advance.

1
Thanks for the edit @JJlqbal. To add to my question, maybe there is a solution by accessing the internal data frame of the geom? Like in geom_histogram() you can access ..count.. or ..density...red.panda

1 Answers

2
votes

You can use fct_reorder from the forcats package. This package was mentioned several times in the first question you linked as well:

# data
dframe <- data.frame(
  variable = rep(c("alpha", "beta", "gamma", "delta"), each = 4),
  class = c(rep("1", 4),
            "1", "1", "0", "0",
            rep("0", 3), "1",
            rep("1", 3), "0"))

dframe %>%
  # convert variable to a factor, ordered (in descending order) by the proportion of
  # rows where the class == "1"
  mutate(variable = forcats::fct_reorder(.f = variable, 
                                         .x = class,
                                         .fun = function(.x) mean(.x == "1"),
                                         .desc = TRUE)) %>%
  ggplot(aes(x = variable, fill = class)) +
  geom_bar(position = "fill") +
  labs(y = "Proportion")

plot