0
votes

Ordering of factor levels in ggplot is a common issue, and there are a number of posts about it (e.g., Avoid ggplot sorting the x-axis while plotting geom_bar()).

This may be a duplicate, but I haven't come across this particular situation.

I'm trying to maintain the order of the X-axis variable ("cylinders") in a stacked bar plot. Here's a toy example. I converted the variable below to emphasize the alphabetic ordering on the X axis even though that variable (cylinders) has explicit ordering set earlier in the dataframe as "Four cyl", "Six cyl", and "Eight cyl".

What am I doing wrong?

mtcars <- mtcars %>% 
  mutate(cylinders = case_when(cyl == 4 ~ "Four cyl",
                               cyl == 6 ~ "Six cyl",
                               cyl == 8 ~ "Eight cyl"),
         cylinders = reorder(cylinders, cyl, mean)) %>% 
  mutate(engine = case_when(vs == 1 ~ "Manual",
                            vs == 0 ~ "Automatic"))

str(mtcars$cylinders)
levels(mtcars$cylinders)  # [1] "Four cyl"  "Six cyl"   "Eight cyl"
class(mtcars$cylinders)

facet_test <- function(df, gathvar) {

  gath <- enquo(gathvar)

  df %>% 
    select(cylinders, !!gath) %>%
    gather(key, value, -!!gath) %>%
    count(!!gath, key, value) %>%
    group_by(value) %>%
    mutate(perc = round(n/sum(n), 2) * 100) %>%  
    ggplot(aes(x = value, y = perc, fill = !!gath)) +
      geom_bar(stat = "identity")
}

facet_test(df = mtcars, gathvar = engine)

enter image description here

1
Run the internals of that function on your data. After the gather line, you have 3 columns: engine, key, and value. value is where your cylinder information is, but it isn't a factor, so there's no ordering. But I don't see why you need the gather anyway—you could have made this plot without itcamille
As in, take out the gather and go straight to count, then use x = cylinders in your aescamille
Thanks. The gather is there because this is a truncated example of a longer, more complicated function. Any advice on how to make the value column retain the factor information?Daniel
Try making value a factor and getting the levels by order of appearance in that column (such as using forcats::fct_inorder)camille
Ok I'll try that. One moment... In the larger function, I'm using facet_wrap, so I have different gathering variables.Daniel

1 Answers

0
votes

Thanks to the comments and to @alistaire at this post (https://stackoverflow.com/a/39157585/8453014), I was able to arrive at a solution. The problem is that gather coerces factors into characters.

Simple scenario As @aosmith suggested, use mutate(value = factor(value, levels = levels(mtcars$cylinders)) ) after gather.

Complex example with multiple variables The important aspects are 1) define factor levels (whether inside or outside of the function) and 2) apply levels to the "value" column.

Here's a more complicated example to show using three variables and applying facet_wrap to see the plots side by side:

facet_test <- function(df, gathvar, legend_title) {
  gath <- enquo(gathvar)

# next two lines can go inside or outside of the function
  levels_cyl <- c("Four cyl", "Six cyl", "Eight cyl")
  levels_gears <- c("Three", "Four", "Five")

  df %>% 
    select(cylinders, gears, !!gath) %>%
    gather(key, value, -!!gath) %>%
    count(!!gath, key, value) %>%
    ungroup() %>% 
    mutate(value = factor(value, levels = unique(c(levels_cyl, levels_gears), 
                                                         fromLast = TRUE))) %>% 
    arrange(key, value) %>%  
    ggplot(aes(x = value, y = n, fill = !!gath)) +
      geom_bar(stat = "identity") +
      facet_wrap(~ key, scales = "free_x")
}

facet_test(df = mtcars, gathvar = engine)

[correct plot with factor levels in pre-defined order[1]