3
votes

Suppose we have this tibble:

 group item
 x     1
 x     2
 x     2
 y     3
 z     2
 x     2
 x     2
 z     1

I want to perform a group_by by group. However, I'd rather group only by the elements that are adjacent. For example, in my case, I'd have three 'x' groups, summing 'item' elements. The result would be something like:

group item
x 5
y 3
z 2
x 4
z 1

I know how to solve this problem using 'for' loops. However, this is not fast and doesn't sound straightforward. I'd rather use some dplyr or tidyverse function with an easy logic.

This question is not duplicated. I know there's already a question about rle in SO, but my question was more general than that. I asked for general solutions.

2

2 Answers

3
votes

If you want to use only base R + tidyverse, this code exactly replicates your desired results

mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"), 
                   item = c(1, 2, 2, 3, 2, 2, 2, 1))

mydf

# A tibble: 8 × 2
  group  item
  <chr> <dbl>
1     x     1
2     x     2
3     x     2
4     y     3
5     z     2
6     x     2
7     x     2
8     z     1

runs <- rle(mydf$group)

mydf %>% 
  mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>% 
  group_by(group, run_id) %>% 
  summarise(item = sum(item)) %>% 
  arrange(run_id) %>% 
  select(-run_id) 

Source: local data frame [5 x 2]
Groups: group [3]

  group  item
  <chr> <dbl>
1     x     5
2     y     3
3     z     2
4     x     4
5     z     1
2
votes

You can construct group identifiers with rle, but the easier route is to just use data.table::rleid, which does it for you:

library(dplyr)

df %>% 
    group_by(group, 
             group_run = data.table::rleid(group)) %>% 
    summarise_all(sum)
#> # A tibble: 5 x 3
#> # Groups:   group [?]
#>    group group_run  item
#>   <fctr>     <int> <int>
#> 1      x         1     5
#> 2      x         4     4
#> 3      y         2     3
#> 4      z         3     2
#> 5      z         5     1