0
votes

I have a dataframe containing numerical (percentages) and categorical variables. I'd like to produce a stacked barplot (using ggplot2) with the colums (categorical variables) sorted by the numerical variable.

I tried this:

How to control ordering of stacked bar chart using identity on ggplot2

and this:

https://community.rstudio.com/t/a-tidy-way-to-order-stacked-bar-chart-by-fill-subset/5134

but I am not familiar with factors and I'd like to understand more.

# Reproduce a dummy dataset
perc <- c(11.89, 88.11, 2.56, 97.44, 5.96, 94.04, 6.74, 93.26)
names <- c('A', 'A', 'B', 'B', 'C', 'C', 'D', 'D')

df <- data.frame(class = rep(c(-1, 1), 4), 
                 percentage = perc, 
                 name = names)

# Plot
ggplot(df, aes(x = factor(name), y = percentage, fill = factor(class))) +
  geom_bar(stat = "identity") +
  scale_fill_discrete(name = "Class") +
  xlab('Names')

This code produces a plot whose bars are ordered by the variable "names". I'd like to order it by the variable "percentage". Even if I manually order the dataframe, the resulting plot is the same.

3

3 Answers

1
votes

The issue here is that all your percentages for a given category (name) in fact add up to 100%. So sorting by percentage, which is normally achieved via aes(x = reorder(name, percentage), y = percentage), won’t work here.

Instead, you probably want to order by the percentage of the data that has class = 1 (or class = -1). Doing this requires some trickery: Use ifelse to select the percentage for the rows where class == 1. For all other rows, select the value 0:

ggplot(df, aes(x = reorder(name, ifelse(class == 1, percentage, 0)), y = percentage, fill = factor(class))) +
  geom_bar(stat = "identity") +
  scale_fill_discrete(name = "Class") +
  xlab('Names')

You might want to execute just the reorder instruction to see what’s going on:

reorder(df$name, ifelse(df$class == 1, df$percentage, 0))
# [1] A A B B C C D D
# attr(,"scores")
#      A      B      C      D
# 44.055 48.720 47.020 46.630
# Levels: A D C B

As you can see, your names got reordered based on the mean percentage for each category (by default, reorder uses the mean; see its manual page for more details). But the “mean” we calculated was between each name’s percentage for class = 1, and the value 0 (for class ≠ 1).

0
votes

It is similar to Konrad Rudolph, I have just created a factor level and use it to reorder. Here is my solution:

x_order <- with(subset(df, class == -1), reorder(name, percentage))
df$name <- factor(df$name, levels = levels(x_order))
ggplot(df, aes(x = name,  y = percentage, fill = factor(class))) +
  geom_bar(stat = "identity") +
  scale_x_discrete(breaks = levels(x_order))
0
votes

Changing the levels before plotting will do it for you.

lvlorder = order((df[df$class==-1,])$percentage, decreasing = T)

df$name = factor(df$name, levels = levels(df$name)[lvlorder])