0
votes

I have a data frame that looks like this:

data <- structure(list(Sex = c("Male", "Male", "Male", "Male", "Female", 
                               "Male", "Female", "Female", "Female", "Female", "Male", "Female", 
                               "Female", "Female", "Male"), Nationality = c("USA", "USA", "USA", 
                                                                            "UK", "UK", "UK", "France", "France", "France", "France", "France", 
                                                                            "USA", "Canada", "Canada", "Mexico")), row.names = c(NA, 15L), class = "data.frame")

And I've plotted it like that:

ggplot(data, aes(x = factor(Nationality))) +  
  geom_bar(aes(y = (..count..)/sum(..count..), fill = Sex), width = 0.3) +
  scale_y_continuous(labels = percent, limits = c(0, 0.4))+
  coord_flip()

I want to do 2 things:

(1) Re-order the bars in descending order, so that the first bar is the one with the highest count. I have tried reorder as found in other questions on stackoverflow, but I couldn't make it work. Is it because I am using percentages? Please note that I do NOT want to use the sum of counts in the graph, as I still want to be able to represent sex in the plot (i.e., data must not be collapsed). I believe that this particular issue has not been replied before.

(2) Add a label with the count value inside each bar. I have tried the following, but it did not work. The problem is that I don't know how to refer to counts in this context.

geom_text(aes(label = Nationality), nudge_y = +1)

Note. To CLARIFY what I meant by not collapsing data: I know that I could mutate and create a new dataframe with the sums of counts for each nationality. But then I would lose the counts for each sex (the data will be collapsed), and therefore I could no longer represent sex in the plot.

1
If you aggregate your data so that it has the sum or percentage you actually want to plot, then the standard reorder answer should work just fine, and you'll have a count column with no confusion of how to refer to it.Gregor Thomas
Thanks for the reply and edits, Gregor (understood about capitals!). The thing is, how can I aggregate by both nationality and sex without collapsing data? I'm new to r, and I'd appreciate it a lot if you or someone else could show me how to do this with the data I provided? Please also see added note.johnjohn

1 Answers

2
votes

Does this work for you?

library(dplyr)
library(forcats)
library(scales)

data %>%
  # convert Nationality to factor with levels sorted according to 
  # each Nationality's total count, in reverse (i.e. descending) order
  mutate(Nationality = fct_rev(fct_infreq(Nationality))) %>%

  # aggregate by both Nationality & Sex, and calculate percentage
  count(Nationality, Sex) %>%
  mutate(p = n/sum(n)) %>%

  ggplot(aes(x = Nationality, y = p, label = n, fill = Sex)) +
  geom_col(width = 0.3) +
  geom_text(position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = percent, limits = c(0, 0.4)) +
  coord_flip()

plot