2
votes

I'm trying to clean a dataset's names. I've used janitor::clean_names() to start. However, I still have abbreviations that I would like to separate out with an underscore _. I have code that works using rename_with(~str_replace(.x, "gh", "gh_"), .cols = starts_with("gh")), however there are many abbreviations and it would be good to find a way to map or otherwise functionalize this process.

dat <- tibble(ghrisk_value = c(1,2), 
              ghrisk_corrected = c(2,3), 
              devpolicy_value = c(4,5),
              devpolicy_corrected = c(5,6))

# code works but not functionalized
dat %>%
   rename_with(~str_replace(.x, "gh", "gh_"), .cols = starts_with("gh")) %>%
   rename_with(~str_replace(.x, "dev", "dev_"), .cols = starts_with("dev")) %>%
   names()

# attempt at map...
abbr_words <- c("gh", "dev")
map(dat, ~rename_with(str_replace(.x, abbr_words, str_c(abbr_words, "_"))) 

3

3 Answers

3
votes

You don't need map(). Just use the regular expression syntax "(?<=a|b|c)", which matches the position behind a or b or c and insert an underscore. In addition, starts_with() can take a character vector as input to match the union of all elements.

abbr_words <- c("gh", "dev")

pattern <- sprintf("(?<=%s)", str_c(abbr_words, collapse = "|"))
# [1] "(?<=gh|dev)"

dat %>%
  rename_with(~ str_replace(.x, pattern, "_"), starts_with(abbr_words))

# # A tibble: 2 x 4
#   gh_risk_value gh_risk_corrected dev_policy_value dev_policy_corrected
#           <dbl>             <dbl>            <dbl>                <dbl>
# 1             1                 2                4                    5
# 2             2                 3                5                    6
2
votes

You can reduce over the words to replace with str_replace

abbr_words <- c("gh", "dev")

dat %>% 
  rename_all( ~
    reduce(abbr_words, ~str_replace(.x, paste0('^', .y), paste0(.y, '_')), .init = names(dat))
  )

# # A tibble: 2 x 4
#   gh_risk_value gh_risk_corrected dev_policy_value dev_policy_corrected
#           <dbl>             <dbl>            <dbl>                <dbl>
# 1             1                 2                4                    5
# 2             2                 3                5                    6
2
votes

Using map you will need an assist function, which is real_func. map will work on colnames(dat), and will work with one colname at a time. Map requires a function which is real_func, the first param, which is the data param will go before the function, and the remaining param will go later. Repl_func will take column name one at a time and take the list of abbreviated words, loop over it and perform replacements. At end unlist, is required to return a flattened vector.

abbr_words <- c("gh", "dev")

repl_func <- function(x,y){
  for (i in y){
    x <- str_replace(x,i,paste0(i,"_"))
  }
  return (x)
}


colnames(dat) <- unlist(map(colnames(dat), repl_func, abbr_words))