How to perform a group_by with elements that are contiguous in R and dplyr

Question

Suppose we have this tibble:

 group item
 x     1
 x     2
 x     2
 y     3
 z     2
 x     2
 x     2
 z     1

I want to perform a group_by by group. However, I'd rather group only by the elements that are adjacent. For example, in my case, I'd have three 'x' groups, summing 'item' elements. The result would be something like:

group item
x 5
y 3
z 2
x 4
z 1

I know how to solve this problem using 'for' loops. However, this is not fast and doesn't sound straightforward. I'd rather use some dplyr or tidyverse function with an easy logic.

This question is not duplicated. I know there's already a question about rle in SO, but my question was more general than that. I asked for general solutions.

HAVB HAVB · Accepted Answer · 2017-06-21T03:35:01

If you want to use only base R + tidyverse, this code exactly replicates your desired results

mydf <- tibble(group = c("x", "x", "x", "y", "z", "x", "x", "z"), 
                   item = c(1, 2, 2, 3, 2, 2, 2, 1))

mydf

# A tibble: 8 × 2
  group  item
  <chr> <dbl>
1     x     1
2     x     2
3     x     2
4     y     3
5     z     2
6     x     2
7     x     2
8     z     1

runs <- rle(mydf$group)

mydf %>% 
  mutate(run_id = rep(seq_along(runs$lengths), runs$lengths)) %>% 
  group_by(group, run_id) %>% 
  summarise(item = sum(item)) %>% 
  arrange(run_id) %>% 
  select(-run_id) 

Source: local data frame [5 x 2]
Groups: group [3]

  group  item
  <chr> <dbl>
1     x     5
2     y     3
3     z     2
4     x     4
5     z     1

How to perform a group_by with elements that are contiguous in R and dplyr

2 Answers