0
votes

I have a (tidy) data frame that looks like so:

!> my_table
 # A tibble: 8 × 4
            g     a     b     val
       <date> <lgl> <lgl>   <int>
 1 2015-01-01 FALSE FALSE 3175030
 2 2015-01-01 FALSE  TRUE   72229
 3 2015-01-01  TRUE FALSE  125505
 4 2015-01-01  TRUE  TRUE  856737
 5 2015-02-01 FALSE FALSE 3413510
 6 2015-02-01 FALSE  TRUE   69203
 7 2015-02-01  TRUE FALSE  122925
 8 2015-02-01  TRUE  TRUE  876366

Now I'd like to group-by g and perform arithmetic between rows based on filters on columns a and b. For example, I'd like to take (for each group) the difference between the (TRUE,FALSE) row and the (FALSE,TRUE) row:

 # A tibble: 2 × 2
            g  diff
       <date> <int>
 1 2015-01-01 53276
 2 2015-02-01 53722

In a non/semi-tidyverse-world, I'd do something like filter first on the (TRUE,FALSE) rows then join that with another filtered table on the (FALSE,TRUE) rows, then take the difference between the c columns for each, like so:

diff_table <- inner_join(
  filter(my_table, a, !b) %>% select(g, val1 = val),
 ,filter(my_table, !a, b) %>% select(g, val2 = val)
) %>% transmute(g, diff = val1 - val2)

This works just fine... but seems inelegant and I think I might be missing an easier method by which this can be done using group_by. I.e. grouping by g then performing arithmetic on specific values on/between rows. Anyone know of a more 'tidy' and elegant solution here?

1

1 Answers

0
votes

Something like this?

df %>% 
  group_by(g) %>%
  filter(a + b == 1) %>%
  arrange(a) %>%
  summarise(diff=diff(val))

# A tibble: 2 × 2
           g  diff
      <date> <int>
1 2015-01-01 53276
2 2015-02-01 53722