2
votes

I want to recode the following values within selected columns based on the summary statistics of the column (for example median value of the column). For example if cell value < median (df$variable) = 1, if cell value = median (df$variable) = 0, if cell value > median (df$variable) = 2. The following variables defined by core.vars in the dataset, and still keep the rest of the variables in the data frame.

I have tried a number of ways to implement this. Using case_when, mutate, summarise_each with unsuccessful results. The original dataset contains several hundred columns and rows so I would like to select the columns and try to be concise.

temp.df <- as.tibble (mtcars)
other.vars <- c('hp', 'drat', 'wt')
core.vars <- c('mpg', 'cyl', 'disp')
temp.df <- rownames_to_column (temp.df, var ="cars_id")
temp.df <- temp.df %>% mutate_if (is.integer, as.numeric)

Attempt 1:

`temp.df <- temp.df %>% mutate_at(.vars %in% (core.vars)), funs ({
lookupvariable <- temp.df %>% pull (quo_name(quo(.))) #extract column name
ifelse(is.na(.), lookup_value, .)}),
function (x) case_when (
x < median(lookupvariable) ~ 1,
x == median(lookupvariable) ~ 0,
x > median(lookupvariable) ~ 2
))`

Extract column name in mutate_if call

Attempt 2:

`temp.df <- temp.df %>% mutate_at(.vars %in% (core.vars)), function (x) case_when (
x < summarise_each (list (median)) ~ 1,
x == summarise_each (list (median)) ~ 0,
x > summarise_each (list (median)) ~ 2
))`

This does not work because of data passed to summarise is not as a vector

Previous questions on the forum include how to do this for individual variables, however I have 100 variables and 300 samples so inputting them individually line by line is not an option. I have looked at the following solutions but they are all slightly different.

Using dplyr to group_by and conditionally mutate only with if (without else) statement

Using dplyr summarise with conditions

dplyr conditional summarise function

Mean of column based on multiple conditions in R

R: Recoding variables using recode, mutate and case_when

Ideally, it would be nice to not create a separate data frame and then do join, or to create multiple separate variables as mutate would do. I am sure there is a a for loop and/or ifelse method for this, but was trying to use tidyverse to achieve the goals. Any suggestions would be helpful. Thanks in advance.

1
You can use mutate_at(vars(core.vars)akrun
You could use sign temp.df %>% mutate_at(vars(core.vars), ~ sign(. - median(.)))akrun

1 Answers

2
votes

With mutate_at, it is creating/modifying the column

library(dplyr)
temp.df %>% 
   mutate_at(vars(core.vars), ~ {
             md <- median(.)
         case_when(. < md ~ 1, . == md ~ 0, . > md ~ 2)})

The values can be also changed without casewhen

temp.df %>% 
       mutate_at(vars(core.vars), ~ sign(. - median(.)))