4
votes

While there are several SO posts on how to use percents scaled within each facet of a bar chart, I don't see any that show how to do that in a histogram. Is it possible to so do?

Here are two posts I researched:

SO post 1: Obtaining Percent Scales Reflective of Individual Facets with ggplot2 Last answer on this post indicates solution no longer working on newer versions of ggplot2 and suggests use of stat_count() but does not give example.

SO post2: percentage on y lab in a faceted ggplot barchart?

The following code creates a histogram with the percents scaled across all facets (i.e. sum of all bars is 100%) rather than within each facet.

ggplot(iris, aes(Sepal.Width, y=(..count..)/sum(..count..))) + geom_histogram(bins=2) +
  facet_grid(~Species) + scale_y_continuous(labels = scales::percent)

Histogram

Is there a way to scale within each facet? If not, what would be an efficient strategy of converting to bar chart? If I had to go that route, I could use cut() to create a factor of bin indicators, then calculate bin frequencies within each level of facet variable (using dplyr::count()?), then use geom_bar(). Seems convoluted. I suspect there is a geom_histogram() solution.

Thanks for any thoughts ...

1
I may be misunderstanding the problem. Doesn't this answer in the second post you linked to do that? It seems to still work with geom_histogram with set bins to my eye (but didn't check what the true proportions were).aosmith

1 Answers

2
votes

@aosmith pointed out that one answer in one of the researched posts makes use of the ..PANEL.. internal variable. Taking that suggestion, the updated code below does work, albeit with more complex y-mapping.

ggplot(iris, aes(Sepal.Width, y=(..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..])) +
  geom_histogram(bins=2) + facet_grid(~Species) + 
  scale_y_continuous(labels = scales::percent) + labs(y='% within facet')

enter image description here

However, in this SO post, Hadley Wickham advises against using ..PANEL.. (and other internal variables) in this manner. He suggests doing the aggregation outside ggplot. So perhaps the answer to the question is that you can use the intentionally undocumented '..PANEL..' variable but beware the risk of using that type of functionality, which is susceptible to change in new releases without explanation.

Perhaps someone can request a feature that controls the density scaling of facets to 'overall', 'by col', 'by row', 'by facet'.