
I have a data frame that looks like this:

data <- structure(list(Sex = c("Male", "Male", "Male", "Male", "Female", 
                               "Male", "Female", "Female", "Female", "Female", "Male", "Female", 
                               "Female", "Female", "Male"), Nationality = c("USA", "USA", "USA", 
                                                                            "UK", "UK", "UK", "France", "France", "France", "France", "France", 
                                                                            "USA", "Canada", "Canada", "Mexico")), row.names = c(NA, 15L), class = "data.frame")

And I've plotted it like that:

ggplot(data, aes(x = factor(Nationality))) +  
  geom_bar(aes(y = (..count..)/sum(..count..), fill = Sex), width = 0.3) +
  scale_y_continuous(labels = percent, limits = c(0, 0.4))+

I want to do 2 things:

(1) Re-order the bars in descending order, so that the first bar is the one with the highest count. I have tried reorder as found in other questions on stackoverflow, but I couldn't make it work. Is it because I am using percentages? Please note that I do NOT want to use the sum of counts in the graph, as I still want to be able to represent sex in the plot (i.e., data must not be collapsed). I believe that this particular issue has not been replied before.

(2) Add a label with the count value inside each bar. I have tried the following, but it did not work. The problem is that I don't know how to refer to counts in this context.

geom_text(aes(label = Nationality), nudge_y = +1)

Note. To CLARIFY what I meant by not collapsing data: I know that I could mutate and create a new dataframe with the sums of counts for each nationality. But then I would lose the counts for each sex (the data will be collapsed), and therefore I could no longer represent sex in the plot.

If you aggregate your data so that it has the sum or percentage you actually want to plot, then the standard reorder answer should work just fine, and you'll have a count column with no confusion of how to refer to it.Gregor Thomas
Thanks for the reply and edits, Gregor (understood about capitals!). The thing is, how can I aggregate by both nationality and sex without collapsing data? I'm new to r, and I'd appreciate it a lot if you or someone else could show me how to do this with the data I provided? Please also see added note.johnjohn

Does this work for you?


data %>%
  # convert Nationality to factor with levels sorted according to 
  # each Nationality's total count, in reverse (i.e. descending) order
  mutate(Nationality = fct_rev(fct_infreq(Nationality))) %>%

  # aggregate by both Nationality & Sex, and calculate percentage
  count(Nationality, Sex) %>%
  mutate(p = n/sum(n)) %>%

  ggplot(aes(x = Nationality, y = p, label = n, fill = Sex)) +
  geom_col(width = 0.3) +
  geom_text(position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = percent, limits = c(0, 0.4)) +
