1
votes

I have a dataframe:

df <- data.frame(x = 1:5, y = rep(1,5), z = 0:4, 
                 fx = NA_real_, fy = NA_real_, fz = NA_real_)
my_count_columns <- c("x", "y", "z")

I want to fill in information by mutating in place columns fx, fy, fz that represents the frequency of each count variable.

What is the cleanest way to do this in dplyr/tidyverse, assuming I don't know the column names ahead of time?

Expected output:

  x y z         fx  fy  fz
1 1 1 0 0.06666667 0.2 0.0
2 2 1 1 0.13333333 0.2 0.1
3 3 1 2 0.20000000 0.2 0.2
4 4 1 3 0.26666667 0.2 0.3
5 5 1 4 0.33333333 0.2 0.4
2
Please provide the expected output.tmfmnk
@tmfmnk added 12345thc

2 Answers

2
votes

In base R, this could be

df[paste0('f', my_count_columns)] <- lapply(my_count_columns, 
   function(x) sapply(df[[x]], function(y) 
       mean(y == df[setdiff(my_count_columns, x)])))

Or in tidyverse

library(dplyr)
library(purrr)
df %>%
    select(all_of(my_count_columns)) %>% 
    mutate(across(everything(), ~  map_dbl(., function(x)
      mean(x == df[setdiff(my_count_columns, cur_column())])), 
          .names = 'f{.col}'))
1
votes

You can use prop.table to get the proportions for all my_count_columns.

library(dplyr)

df %>% mutate(across(all_of(my_count_columns), prop.table, .names = 'f{col}'))

#  x y z         fx  fy  fz
#1 1 1 0 0.06666667 0.2 0.0
#2 2 1 1 0.13333333 0.2 0.1
#3 3 1 2 0.20000000 0.2 0.2
#4 4 1 3 0.26666667 0.2 0.3
#5 5 1 4 0.33333333 0.2 0.4