Conditional sums for data frame rows using dplyr

Question

I am new to dplyr/tidyverse and I would like to sum rows of a dataset if the values in a given column(s) exceed a given value. For example given this dataframe,

a<-c(2,3,2,1,0)
b<-c(2,3,3,2,1)
z<-c(3,2,1,1,0)

data.abz <- data.frame(a,b,z)
data.abz

  a b z
1 2 2 3
2 3 3 2
3 2 3 1
4 1 2 1
5 0 1 0

I would like to sum across the rows if the value in column a or b is greater than 1 and if the value at column z is greater than 0. If the condition is not satisfied the row sum is 0. For example, given the previous data frame, I would like to get the following,

  a b z sum_values
1 2 2 3          7
2 3 3 2          8
3 2 3 1          6
4 1 2 1          3
5 0 1 0          0

The last two rows do not satisfy the condition and therefore they were assigned a value of 0. This is what I have done but I am sure there is a better way to achieve the same.

data.abz <- data.frame(a,b,z) %>%
mutate_at(vars(c(a,b)), 
      function(x) case_when(x < 2 ~ 0, TRUE~as.double(x)))%>%
mutate(sum_values = rowSums(.[1:3]))

Any more idiomatic and better ideas with R and dplyr?

Given your description and results, shouldn't row 4 be equal to 4 since b > 1 and z > 0? — Christian Million
Thanks @MillionC. The summation of the rows should only be for values that satisfy the condition. E.g for row 4 it would then be 2 + 1 only. Is it possible to sum the values that satisfy the conditions only and not all the values in that row? I have edited my question and code example that I had done which might not be efficient. — Julie Jemutai

Christian Million Christian Million · Accepted Answer · 2018-06-28T15:07:14

I like to use dplyr's case_when function for conditional calculations. But depending on your needs you may need something else.

library(dplyr)

df <- tibble(a = c(2, 3, 2, 1, 0),
                b = c(2, 3, 3, 2, 1),
                z = c(3, 2, 1, 1, 0))

df <- df %>%
      mutate(sum_values = case_when((a > 1 | b > 1) & (z > 0) ~ (a+b+z),
                                    TRUE ~ 0))

That code produces different results than your results (specifically row 4), but let me know if it works. Or give a little more explanation on your desired output.

Conditional sums for data frame rows using dplyr

1 Answers