2
votes

I have an odd dataset of ~ 50 rows x 200 columns and I would like to re-code the values (strings) in certain columns using another dataframe encoding the conversion from old names to new names. I would like to do this in base R or dplyr, ideally using dplyr/ the tidyverse.

There are about 70 unique values to switch so coding each change individually would be too long (e.g. a simple str_replace), and there are about 70 columns containing the values I want changed so impractical to code anything specifying one named column at a time.

The columns containing values I want changed occur every third column excluding the first indexing column, and those values do not occur anywhere else in the dataframe. So a mutate_all type solution looking up any string across the dataframe that matches the old names in the conversion dataframe and replaces them with their new names would work. Note there is some data missingness.

Example data:

library("tidyverse")

# main dataset
col1 <- c("set1", "set2", "set3", "set4", "set5", "set6")
name_1 <- c("A", "D", "B", "A", "C", "A")
colour_1 <- c("red", "cyan", "red", "blue", "red", "blue")
shade_1 <- c("dark", "dark", "light", "light", "light", "light")
name_2 <- c("", "C", "D", "D", "", "A")
colour_2 <- c("", "red", "yellow", "blue", "", "purple")
shade_2 <- c("", "dark", "light", "dark", "", "light")
df_main <- data.frame(col1,name_1,colour_1,shade_1,name_2,colour_2,shade_2)

# The name_1 and name_2 columns are the ones I want changed, based on a conversion dataframe

# conversion dataframe
names_old <- c("A", "B", "C", "D", "E")
names_new <- c("1", "2", "3", "4", "5")
df_conversion <- data.frame(names_old,names_new)

# I want to base the switching from old names to new names in the main database using df_conversion

# Desired output
name_1_alt <- c("1", "4", "2", "1", "3", "1")
name_2_alt <- c("", "3", "4", "4", "", "1")
df_main_alt <- data.frame(col1,name_1_alt,colour_1,shade_1,name_2_alt,colour_2,shade_2)

Any help much appreciated, thanks.

1

1 Answers

5
votes

We can use a named vector to match and change the values

library(dplyr) # 1.0.0
library(tibble)
df_main_alt1 <- df_main %>%
        mutate(across(starts_with('name'), ~ deframe(df_conversion)[.]))
df_main_alt1
#   col1 name_1 colour_1 shade_1 name_2 colour_2 shade_2
#1 set1      1      red    dark   <NA>                 
#2 set2      4     cyan    dark      3      red    dark
#3 set3      2      red   light      4   yellow   light
#4 set4      1     blue   light      4     blue    dark
#5 set5      3      red   light   <NA>                 
#6 set6      1     blue   light      1   purple   light

If we have a version of dplyr < 1.0.0, use mutate_at

df_main_alt1 <- df_main %>%
            mutate_at(vars(starts_with('name')), ~ 
                  deframe(df_conversion)[.])

Or may be use recode

df_main %>%
         mutate_at(vars(starts_with('name')), ~ 
             dplyr::recode(., !!! deframe(df_conversion)))
# col1 name_1 colour_1 shade_1 name_2 colour_2 shade_2
#1 set1      1      red    dark                        
#2 set2      4     cyan    dark      3      red    dark
#3 set3      2      red   light      4   yellow   light
#4 set4      1     blue   light      4     blue    dark
#5 set5      3      red   light                        
#6 set6      1     blue   light      1   purple   light