I want to recode the following values within selected columns based on the summary statistics of the column (for example median value of the column). For example if cell value < median (df$variable) = 1, if cell value = median (df$variable) = 0, if cell value > median (df$variable) = 2. The following variables defined by core.vars in the dataset, and still keep the rest of the variables in the data frame.
I have tried a number of ways to implement this. Using case_when
, mutate
, summarise_each
with unsuccessful results. The original dataset contains several hundred columns and rows so I would like to select the columns and try to be concise.
temp.df <- as.tibble (mtcars)
other.vars <- c('hp', 'drat', 'wt')
core.vars <- c('mpg', 'cyl', 'disp')
temp.df <- rownames_to_column (temp.df, var ="cars_id")
temp.df <- temp.df %>% mutate_if (is.integer, as.numeric)
Attempt 1:
`temp.df <- temp.df %>% mutate_at(.vars %in% (core.vars)), funs ({
lookupvariable <- temp.df %>% pull (quo_name(quo(.))) #extract column name
ifelse(is.na(.), lookup_value, .)}),
function (x) case_when (
x < median(lookupvariable) ~ 1,
x == median(lookupvariable) ~ 0,
x > median(lookupvariable) ~ 2
))`
Extract column name in mutate_if call
Attempt 2:
`temp.df <- temp.df %>% mutate_at(.vars %in% (core.vars)), function (x) case_when (
x < summarise_each (list (median)) ~ 1,
x == summarise_each (list (median)) ~ 0,
x > summarise_each (list (median)) ~ 2
))`
This does not work because of data passed to summarise is not as a vector
Previous questions on the forum include how to do this for individual variables, however I have 100 variables and 300 samples so inputting them individually line by line is not an option. I have looked at the following solutions but they are all slightly different.
Using dplyr to group_by and conditionally mutate only with if (without else) statement
Using dplyr summarise with conditions
dplyr conditional summarise function
Mean of column based on multiple conditions in R
R: Recoding variables using recode, mutate and case_when
Ideally, it would be nice to not create a separate data frame and then do join, or to create multiple separate variables as mutate would do. I am sure there is a a for loop and/or ifelse method for this, but was trying to use tidyverse to achieve the goals. Any suggestions would be helpful. Thanks in advance.
mutate_at(vars(core.vars)
– akrunsign
temp.df %>% mutate_at(vars(core.vars), ~ sign(. - median(.)))
– akrun