0
votes

I would like to create multiple new columns in a data frame based on a conditional. From reading other questions I think this requires the case_when() function within the mutate() function. Though I'm familiar with creating new columns using mutate(), I can't get it to work with different functions based on a conditional.

require(tidyverse)

df1 <- tibble(a = c(-0.5, 0, 0.1, 0.5, 0.8),
              b = c(-0.2, NA, 0.3, 0.1, 0.2),
              c = c(0, 0.2, 0.1, 0.3, 0.1),
              d = c(NA, -0.1, 0.7, 0.6, 0.4),
              e = c(0.2, 0.6, NA, 0.4, 0.5), 
              f = c(0.7, 0.2, NA, 0.5, 0.5))

My actual data frame contains 60 variables, but using df1 as an example I would like to:

i) Identify which columns contain value(s) ≤0

ii) For each column(s) that contains value(s) ≤0, create a new column(s) of log(x + 1)

iii) For each column that contains only values >0, create a new column(s) of log(x)

The NA values should be retained as NA in the new columns.

A tidyverse solution would be fantastic because I find the syntax easier to understand, but appreciate any solution.

1

1 Answers

0
votes

I think the following gives you what you need:

# load libraries
library(tidyverse)

# define data
df1 <- tibble(a = c(-0.5, 0, 0.1, 0.5, 0.8),
              b = c(-0.2, NA, 0.3, 0.1, 0.2),
              c = c(0, 0.2, 0.1, 0.3, 0.1),
              d = c(NA, -0.1, 0.7, 0.6, 0.4),
              e = c(0.2, 0.6, NA, 0.4, 0.5), 
              f = c(0.7, 0.2, NA, 0.5, 0.5))

# use mutate all and apply a function that first tests whether all values are
# above 0 or not. Depending on the anser, apply a specific function to the
# column in question
df1 %>% 
  mutate_all(.funs = list(result = function(x) {
    if (all(x > 0, na.rm = T)) log(x) else log(x + 1)
    }))
#> # A tibble: 5 x 12
#>       a     b     c     d     e     f a_result b_result c_result d_result
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1  -0.5  -0.2   0    NA     0.2   0.7  -0.693   -0.223    0        NA    
#> 2   0    NA     0.2  -0.1   0.6   0.2   0       NA        0.182    -0.105
#> 3   0.1   0.3   0.1   0.7  NA    NA     0.0953   0.262    0.0953    0.531
#> 4   0.5   0.1   0.3   0.6   0.4   0.5   0.405    0.0953   0.262     0.470
#> 5   0.8   0.2   0.1   0.4   0.5   0.5   0.588    0.182    0.0953    0.336
#> # … with 2 more variables: e_result <dbl>, f_result <dbl>