3
votes

I have a question that is more directed towards what's the most efficient way in order to create a bar chart with percentage labels and the intended layout. I have a data frame containing several columns which includes the column "economy". That column does have the five values "Very good", "Good", "Bad", "Very bad" and "Don't know". Here is the reproducible data:

structure(c(3L, 3L, 3L, 3L, 2L, 3L, 4L, 4L, 4L, 4L, 3L, 2L, 2L, 
2L, 3L, 2L, 4L, 4L, 2L, 3L, 4L, 3L, 4L, 4L, 3L, 2L, 2L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 3L, 2L, 4L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 4L, 
2L, 4L, 4L, 3L, 2L), .Label = c("Very good", "Good", "Bad", "Very bad", 
"Don't know"), class = "factor")

I used this code with the desired outcome:

lebanon %>%
  filter(!is.na(economy), economy != "Don't know") %>%
  count(economy) %>%
  mutate(prop = n / sum(n)) %>%
  ggplot(aes(economy, y = prop, fill = economy)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c("darkgreen", "green4", "red3", "red4")) +
  scale_y_continuous(labels = scales::percent) +
  geom_text(aes(label = scales::percent(prop, suffix = "")),
            position=position_dodge(width=0.9), vjust=-0.5, size = 5) +
  labs(x = "", y = "", fill = "") +
  theme_minimal() +
  theme(axis.text.x = element_text(size = 15),
        axis.text.y = element_text(size = 15),
        legend.text = element_text(size = 15))

Getting this plot:

enter image description here

I'm wondering know if this is the most efficient way in order to recalculate the counts to percentage with the desired layout. I used the count function and mutate, but I also know that there might be other ways of handling this problem with stat(prop) and ..count.. function. The problem is that when I use stat(prop) or fill = "prop", it doesn't take the scale_fill_manual function.

enter image description here

So my question is what's the most efficient way to get my desired bar chart (the one above) without too many intermediate steps for the calculation of percentages. Sorry in advance if my question is not clearly formulated. :)

Greetings

2
Your example is not reproducible. You provide a factor, not a data.frame and the factor does not contain any observations with value "Very good", yet your graph does.Limey
Probably because there are only few observations with that value in a dataset of 2400 rows.Nicosc

2 Answers

1
votes

The new statistic stat_prop() available in GGally was designed for computing proportion easily. More details on http://ggobi.github.io/ggally/articles/ggally_stats.html#stat-prop-

The by aesthetic indicates the denominator. Here by = 1 because you want % of the total.

If you add a facet, all proportions will be computed separately per facet.

In your case you could try something like

library(ggplot2)
library(GGally)

ggplot(lebanon) +
  aes(x = economy, y = after_stat(prop), fill = economy, by = 1) +
  geom_bar(stat = "prop") +
  geom_text(aes(label = scales::percent(after_stat(prop))), stat = "prop", vjust=-0.5)

enter image description here

0
votes

You can try this solution. I used your sample of data. I hope this can help:

library (ggplot2)
library(scales)

lebanon %>%
  filter(!is.na(economy), economy != "Don't know") %>%
  ggplot(aes(x= economy)) + 
  geom_bar(aes(y = (..count..)/sum(..count..), fill = economy), stat="count") +
  geom_text(aes( label = scales::percent((..count..)/sum(..count..)),
                 y= (..count..)/sum(..count..) ), stat= "count", vjust = -.5) +
  labs(y = "Percent", fill="Economy") +
  scale_y_continuous(labels = scales::percent)

enter image description here

I also found this package that could help you: http://larmarange.github.io/JLutils/reference/stat_fill_labels.html