How do I remove offsetting rows in a tibble?

Question

I am trying to remove rows that have offsetting values.

library(dplyr)
a <- c(1, 1, 1, 1, 2, 2, 2, 2,2,2)
b <- c("a", "b", "b", "b", "c", "c","c", "d", "d", "d")
d <- c(10, 10, -10, 10, 20, -20, 20, 30, -30, 30)
o <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
df <- tibble(ID = a, SEQ = b, VALUE = d, OTHER = o)

Generates this ordered table that is grouped by ID and SEQ.

> df
# A tibble: 10 x 4
      ID   SEQ VALUE OTHER
   <dbl> <chr> <dbl> <chr>
 1     1     a    10     A
 2     1     b    10     B
 3     1     b   -10     C
 4     1     b    10     D
 5     2     c    20     E
 6     2     c   -20     F
 7     2     c    20     G
 8     2     d    30     H
 9     2     d   -30     I
10     2     d    30     J

I want to drop the row pairs (2,3), (5,6), (8,9) because VALUE negates the VALUE in the matching previous row.

I want the resulting table to be

> df2
# A tibble: 4 x 4
     ID   SEQ VALUE OTHER
  <dbl> <chr> <dbl> <chr>
1     1     a    10     A
2     1     b    10     D
3     2     c    20     G
4     2     d    30     J

I know that I can't use group_by %>% summarize, because I need to keep the value that is in OTHER. I've looked at the dplyr::lag() function but I don't see how that can help. I believe that I could loop through the table with some type of for each loop and generate a logical vector that can be used to drop the rows, but I was hoping for a more elegant solution.

It seems like you asked a question, got an answer, accepted the answer, and several hours later you are editing the question. This is bad - it is "moving the goal posts". Instead, I would recommend creating a new question, linking to this one, and explaining why the original answer didn't work. — Gregor Thomas

Bastien Bastien · Accepted Answer · 2017-12-11T15:51:55

What about:

vec <- cbind(
               c(head(df$VALUE,-1) + df$VALUE[-1], 9999) ,
               df$VALUE + c(9999, head(df$VALUE,-1))
             )

vec <- apply(vec,1,prod)
vec <- vec!=0
df[vec,]
# A tibble: 4 x 4
     ID   SEQ VALUE OTHER
  <dbl> <chr> <dbl> <chr>
1     1     a    10     A
2     1     b    50     D
3     2     c    60     G
4     2     d    70     J

The idea is to take your VALUE field and subtract it with a slightly subset version of it. When the result is 0, than you remove the line.

How do I remove offsetting rows in a tibble?

2 Answers