33
votes

doing facets in ggplot I would often like the percentage to be used instead of counts.

e.g.

test1 <- sample(letters[1:2], 100, replace=T)
test2 <- sample(letters[3:8], 100, replace=T)
test <- data.frame(cbind(test1,test2))
ggplot(test, aes(test2))+geom_bar()+facet_grid(~test1)

This is very easy but if N is different in facet A compared to facet B, it would be better I think, to compare percentages, in such a way that the each facet sums to 100%.

how would you achieve this?

Hope my question makes sense.

Sincerely.

6

6 Answers

54
votes

Here is a within ggplot method, using ..count.. and ..PANEL..:

ggplot(test, aes(test2)) + 
    geom_bar(aes(y = (..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..])) + 
    facet_grid(~test1)

As this is computed on the fly, it should be robust to changes to plot parameters.

21
votes

Try this:

# first make a dataframe with frequencies
df <- as.data.frame(with(test, table(test1,test2)))
# or with count() from plyr package as Hadley suggested
df <- count(test, vars=c('test1', 'test2'))
# next: compute percentages per group
df <- ddply(df, .(test1), transform, p = Freq/sum(Freq))
# and plot
ggplot(df, aes(test2, p))+geom_bar()+facet_grid(~test1)

alt text

You could also add + scale_y_continuous(formatter = "percent") to the plot for ggplot2 version 0.8.9, or + scale_y_continuous(labels = percent_format()) for version 0.9.0.

7
votes

A very simple way:

ggplot(test, aes(test2)) + 
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    facet_grid(~test1)

So I only changed the parameter of geom_bar to aes(y = (..count..)/sum(..count..)). After setting ylab to NULL and specifying the formatter, you could get:

ggplot(test, aes(test2)) +
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    facet_grid(~test1) +
    scale_y_continuous('', formatter="percent")

Update Note that while formatter = "percent") works for ggplot2 version 0.8.9, in 0.9.0 you'd want something like scale_y_continuous(labels = percent_format()). alt text

1
votes

Here's a solution that should get you moving in the right direction. I'm curious to see if there are more efficient ways to go about doing this as this seems a bit hacky and convoluted. We can use the built in ..density.. argument for the y aesthetic, but factors don't work there. So we also need to use scale_x_discrete to appropriately label the axis once we converted test2 into a numeric object.

ggplot(data = test, aes(x = as.numeric(test2)))+ 
geom_bar(aes(y = ..density..), binwidth = .5)+ 
scale_x_discrete(limits = sort(unique(test$test2))) + 
facet_grid(~test1) + xlab("Test 2") + ylab("Density") 

But give this a whirl and let me know what you think.

Also, you can shorten your test data creation like so, which avoids the extra objects in your environment and having to cbind them together:

test <- data.frame(
    test1 = sample(letters[1:2], 100, replace = TRUE), 
    test2 = sample(letters[3:8], 100, replace = TRUE)
)
0
votes

I deal with similar situations quite frequently, but take a very different approach that uses two of Hadley's other packages, namely reshape and plyr. Primarily because I have a preference for looking at things as 100% stacked bars (when they total to 100%).

test <- data.frame(sample(letters[1:2], 100, replace=T), sample(letters[3:8], 100, replace=T))
colnames(test) <- c("variable","value")
test <- cast(test, variable + value ~ .) 
colnames(test)[3] <- "frequ"

test <- ddply(test,"variable", function(x) {
    x <- x[order(x$value),]
    x$cfreq <- cumsum(x$frequ)/sum(x$frequ)
    x$pos <- (c(0,x$cfreq[-nrow(x)])+x$cfreq)/2
    x$freq <- (x$frequ)/sum(x$frequ)
    x
})

plot.tmp <- ggplot(test, aes(variable,frequ, fill=value)) + geom_bar(stat="identity", position="fill") + coord_flip() + scale_y_continuous("", formatter="percent")
0
votes

Thank you for sharing the PANEL "tip" on the ggplot method.

For information: you can produce percentages in y lab, on the same bar chart, by using count and group in the ggplot method:

ggplot(test, aes(test2,fill=test1))
   + geom_bar(aes(y = (..count..)/tapply(..count..,..group..,sum)[..group..]), position="dodge")
   + scale_y_continuous(labels = percent)