1
votes

I have a data frame with two groups, the years they were eligible for a service, and a flag indicating participation (each group will participate in one year and one year only, but not all groups participate).

I would like to filter the data frame as follows:

  • For any group that participated, keep only the rows for the year that they participated and before (as after they participate, they are no longer eligible)
  • For any group that didn't participate, keep all rows

I know that I can do this piecemeal by splitting the data frame into groups that participated vs. those that didn't and then bind them back together, but I'm wondering whether there is anyway to do this in a single pipe?

library(dplyr)

dat <- data.frame(grp = c("a", "a", "a", "b", "b"),
                  yr = c(2017, 2018, 2019, 2017, 2018),
                  par = c(0, 1, 0, 0, 0))

## this works because group 'a' participated
dat %>% 
  group_by(grp) %>%
  filter(grp == "a") %>%
  filter(yr <= yr[par == 1])

## here i am trying to use conditional filtering but failing
dat %>% 
  group_by(grp) %>% 
  filter(yr <= case_when(max(par) == 1 ~ yr[par == 1],
                         TRUE ~ max(yr)))
1

1 Answers

1
votes

I think this may help you but I would've liked to test it on a larger data set:

library(dplyr)

dat %>%
  group_by(grp) %>%
  mutate(cumx = cumsum(cummax(par))) %>%
  filter(cumx %in% c(0,1) | all(cumx == 0)) %>%
  select(-cumx)

# A tibble: 4 x 3
# Groups:   grp [2]
  grp      yr   par
  <chr> <dbl> <dbl>
1 a      2017     0
2 a      2018     1
3 b      2017     0
4 b      2018     0