
I would like to pass the length of my group_by variable to summarize.

Example data

    df <- data.frame(
groupper = factor(sample.int(n = 12, size = 100, replace = TRUE)),
                     var = runif(100, min = 1, max = 25)

Now I have a different number of factors:

1  2  3  4  5  6  7  8  9 10 11 12 
8  7  4  8  9  7 10  7 11  3 13 13 

Now I would like to simply find the share of var in each groupper in certain intervals.

My code looks like this:

results <- df %>% group_by(groupper) %>% summarise(
var0_25 = sum(var < 25 / length(groupper)), 
var25_50 = sum(var >= 25 & var < 50) / length(groupper))

But, how in the world do I get the correct group_by(groupper) length into my summarize? It changes for each factor.

aaaaaahhhhh, thanks!Thorst

3 Answers


We can use n() to get the number of elements per group

df %>% 
    group_by(groupper) %>% 
    summarise(var0_25 = sum(var <25)/n(), 
              var25_50=sum(var >=25 & var < 50 )/n())

I think a general solution when you want to calculate intervals is to use cut. This code is a bit longer but will work for any amount of intervals by just adjusting cut at your will. It will also save you manually writing column names an equasions

df %>%
  mutate(indx = cut(var, c(1, 25, 50), right = FALSE)) %>%
  group_by(groupper) %>%
  mutate(Count = n()) %>%
  group_by(groupper, indx) %>%
  summarise(Res = n()/Count[1L]) %>%
  spread(indx, Res)

# Source: local data frame [12 x 3]
#    groupper    [1,25)   [25,50)
# 1         1 0.5000000 0.5000000
# 2         2 0.8571429 0.1428571
# 3         3 0.7500000 0.2500000
# 4         4 0.3750000 0.6250000
# 5         5 0.2222222 0.7777778
# 6         6 0.5714286 0.4285714
# 7         7 0.4000000 0.6000000
# 8         8 0.4285714 0.5714286
# 9         9 0.3636364 0.6363636
# 10       10 0.3333333 0.6666667
# 11       11 0.6153846 0.3846154
# 12       12 0.3076923 0.6923077

But length(.) does also work. The problem with your code was that for var0_25 you messed up the brackets:

df %>% group_by(groupper) %>% 
    summarize(r = sum(var < 25) / length(groupper), 
              s = sum(var < 25), 
              l = length(groupper)) %>% 
    mutate(r2 = s / l)

Source: local data frame [12 x 5]

#    groupper r  s  l r2
# 1         1 1  8  8  1
# 2         2 1  7  7  1
# 3         3 1  4  4  1
# 4         4 1  8  8  1
# 5         5 1  9  9  1
# 6         6 1  7  7  1
# 7         7 1 10 10  1
# 8         8 1  7  7  1
# 9         9 1 11 11  1
# 10       10 1  3  3  1
# 11       11 1 13 13  1
# 12       12 1 13 13  1

I added columns s(for sum), l (for length) just to show that the results are indeed correct.