2
votes

I have a data that I want to first filter some rows and sum those remaining rows.

The filtering conditions as follows;

  1. for gr==1 find the last occurrence of y_value==10 and keep the all rows before it (including the last occurrence of this value 10 row)!
  2. for gr==2 find the first occurrence of y_value==10 and keep all the rows after it (including the first occurrence of this value 10 row)!

The data is like this;

df <- data.frame(gr=rep(c(1,2),c(8,7)), 
                 y_value=c(c(2,10,10,8,10,6,0,0),c(0,0,10,10,6,8,10)))



    gr y_value
1   1       2
2   1      10
3   1      10
4   1       8
5   1      10
6   1       6
7   1       0
8   1       0
9   2       0
10  2       0
11  2      10
12  2      10
13  2       6
14  2       8
15  2      10

I tried this in the light of summing-rows-based-on-conditional-in-groups;

df_temp <- df %>% 
  group_by(gr) %>% 
  mutate(rows_to_aggregate=cumsum(y_value==10)) %>% 
  filter(ifelse(gr==1, rows_to_aggregate !=0, ifelse(gr==2, rows_to_aggregate ==0 | y_value==10, rows_to_aggregate ==0))) %>% 
  filter(ifelse(gr==1, row_number(gr) != 1, ifelse(gr==2, row_number(gr) != n(), rows_to_aggregate ==0)))

but the if I do rows_to_aggregate !=0 in gr==1 the rows in the interest will be gone! Any guide at this point will be appreciated!

3

3 Answers

2
votes
df_to_aggregate <- df %>% 
    group_by(gr) %>% 
    mutate(rows_to_aggregate = cumsum(y_value == 10)) %>% 
    filter(!(gr == 1 & rows_to_aggregate == max(rows_to_aggregate) & y_value != 10)) %>%
    filter(!(gr == 2 & rows_to_aggregate == 0)) %>%
    select(-rows_to_aggregate)
df_to_aggregate

# A tibble: 10 x 2
# Groups:   gr [2]
     gr y_value
  <dbl>   <dbl>
1     1       2
2     1      10
3     1      10
4     1       8
5     1      10
6     2      10
7     2      10
8     2       6
9     2       8
10    2      10
2
votes

Do not know how to do it in dplyr, but this code seems to work

gr1 = df[df$gr==1,]
last = tail(which(gr1$y_value==10),1)
gr1 = gr1[1:(last-1),]
gr2 = df[df$gr==2,]
first = head(which(gr2$y_value==10),1)
gr2 = gr2[(first+1):dim(gr2)[1],]
final = rbind(gr1,gr2)
1
votes

You can slice with a different slicing condition for each gr.

df %>% 
  group_by(gr) %>% 
  slice(if(any(gr==1)) {1:max(which(y_value==10))} else {min(which(y_value==10)):n()}) 
      gr y_value
 1     1       2
 2     1      10
 3     1      10
 4     1       8
 5     1      10
 6     2      10
 7     2      10
 8     2       6
 9     2       8
10     2      10