3
votes

I am using R studio in Ubuntu, with standard updated R and ggplot2

I try to create a histogram in ggplot, and to separate the data by groups.

I need the plot's y axis to say the frequency of each bin in the subgroup that was split by the facet grid.

for example if i have two entries in the data

a group
1 1
2 2

I need to use facet_grid to split by group, and then to show that a has one bar for 1 that is 100% percent of the examples in group 1 and vice versa.

I found out that the way to do it, is using (..count..)/sum(..count) but sum(..count..) will count the frequency of that been in the entire data frame and will give me unwanted results,

I can't find good documentation for deep using of ..count..

question about special ggplot variables

another question about ..count..

There is nothing very comprehensive in the docs,

This is the example code i am using

df <- data.frame(a = 1:10, b = 1:10, group = c(rep(1,5),rep(2,5)))
p<-ggplot(df) + geom_histogram(aes(x = a, y = (..count..)/sum(..count..))) +  
   facet_grid(group ~ .)

You can see that the y axis will contain 0.1 as the highest value, i would like it to show that 100% percent of the 1 values are in group 1 for example. etc.

edit:

Thanks to Jimbou for the answer and reference to a well built walk around that is suitable for discrete data, pls note that the real problem i am having here will need to use continuous data, and bins that group more than one value, furthermore, there is no proper documentation about how to do this with the ..count.. function and therefor I believe this is important to find a solution and not to use walk around

4
Can you use? ggplot(df, aes(x=a)) + geom_histogram() + facet_grid(group ~ .)Roman
it produces the same result, the problem here is the scale on the y axis and the way that ..count.. counts the different parts of the data. @Jimbouthebeancounter
Why do you need a proportions of group membership when they are all in the same group for each variable.shayaa
i don't need the proportion of group membership, i need the proportion of a distribution, please note the code. @shayaathebeancounter
@captainshai It is not mine. Please see my answer.Roman

4 Answers

3
votes

After a lot of playing around, and very good directions you all gave, i found that with a little addition and blend between Jimbou's and Shayaa's answers, and some added code this works beautifully.

t <- data %>% group_by(group,member,v_rate) %>% tally %>% mutate(f = n/sum(n))

will take the data and will group by group, member, v_rate, and will add count of each group divided by the sum (relative frequency in the group)

than we want to create the histogram with ggplot2 and use those values as the weight function of the histogram, otherwise it was all for vain,

 p <- ggplot(t, aes(x = v_rate, weight = f)) + geom_histogram() + facet_grid(group ~ member)

that works great.

2
votes

Here is a dplyr solution.

df%>% group_by(group)%>%mutate(n = n(), prop = n/sum(n))
1
votes

You can try:

First calculate length of each group using ave:

df$gr_l <- ave(df$a, df$group, FUN = function(x) length(x))

Get the proportion of each a within the groups using by:

df$gr_prop <- c(by(df, df$group + df$a, FUN = function(x) length(x$a)/unique(x$gr_l) ))

Plot the data.

ggplot(df, aes(x=a, y=gr_prop)) + 
      geom_bar(stat="identity",position='dodge') + 
      facet_grid(group ~ .)

The question is similar to this and that question using ddply or an internal ggplot solution.

1
votes

try ..density.. ? this will give local mass vs local count over overall all-encompassing count as currently written