1
votes

I have a dataframe like this one:

var1 <- c(1,2,0,1)
var2 <- c(2,0,3,1)
var3 <- c(10,5,10,4)

df <- data.frame(var1,var2,var3)

So, what I would like is to "average if" for column var3 values based on the values of each var so I end up with a df like this one:

var1   var2
6.33   8

Explanation: For var2 if I filter on values greater than 0, values of var 3 are (10,10 and 4) the average of those is 8. (24/3)

So basically I'm averaging the value of var3 for each var (var1 and var2) when they are greater than 0.

Thx,

1

1 Answers

2
votes

We can either do this individually with summarise

library(dplyr)
df %>% 
   summarise(var1 = mean(var3[var1 >0]), 
              var2 = mean(var3[var2 > 0]))
#      var1 var2
#1 6.333333    8

Or using map to loop over the columns to filter and then get the mean of 'var3' after subsetting based on the condition

library(purrr)
df %>%
   select(var1:var2) %>%
   map_dfc(~ mean(df$var3[.x > 0]))
#       var1 var2
#1 6.333333    8

Or using reshaping into 'long' format and then to 'wide'

library(tidyr)
df %>% 
  pivot_longer(cols = -var3) %>%
  filter(value >0) %>% 
  group_by(name) %>%
  summarise(var3 = mean(var3)) %>%
  pivot_wider(names_from = name, values_from = var3)
# A tibble: 1 x 2
#   var1  var2
#  <dbl> <dbl>
#1  6.33     8

Or in base R

sapply(df[1:2], function(x) mean(df$var3[x > 0]))
#    var1     var2 
#6.333333 8.000000 

Or another option is to change the values that are 0 to NA, and then use colMeans

colMeans(df$var3* NA^(df[1:2] ==0), na.rm = TRUE)
#   var1     var2 
#6.333333 8.000000