4
votes

I have a list of tibbles or data frames (each one with a name in the list) and I want to: (1) create new columns with the same name long.col in each tibble from variables with different names; (2) do this by matching from another tibble with the key linking tibble name and column name in each tibble and finally; (3) binding rows for all the newly created columns with the same name long.col in each tibble as well as identifying the original tibble where they came from.

I would like to do this preferably using tidyverse functions. Here is a an example of both: a) the list of tibbles; b) the key tibble identifying tibble names and variables to select in each one

df1 <- tibble(v1 = c(rep("A", 5), rep("B", 5)),
          v2 = 1:10)
df2 <- tibble(v1 = c(rep("C", 6), rep("D", 6)),
          v3 = 11:22)
df3 <- tibble(v1 = c(rep("E", 4), rep("F", 4)),
          v4 = 23:30)

list.df <- list(df1, df2, df3)
names(list.df) <- c("data1", "data2", "data3")

key <- tibble(data = c("data1", "data2", "data3"),
          vars = c("v2", "v3", "v4"))

The final output should look like:

 final.df <- tibble(data = c(rep("data1", 10), rep("data2", 12), rep("data3", 8)),
               long.col = 1:30)

I need to do this in a much longer list using multiple columns, so it is not feasible to do it separately for each column in each tibble.

2

2 Answers

3
votes

You can use map2 here

library(purrr)
library(tibble)
out <- map2_df(.x = list.df,
               .y = names(list.df),
               .f = ~ {
                 temp <- key[["vars"]][key[['data']] == .y]
                 tibble(data = .y, long.col = .x[[temp]])
                 })

Check the output

identical(final.df, out)
#[1] TRUE
0
votes

The first step in the problem you have stated is to dynamically pick the variable name from the key table, and assign the value in the respective dataframe. This can be done by first defining a function which returns the base variable name (for long.col) based on the data frame.

getBaseVar <- function(dfName, keyTibble){
  varToBeTransformed <- keyTibble %>% dplyr::filter(data == (!!dfName)) %>% 
                       dplyr::select(vars) %>% dplyr::first() %>% 
                         rlang::sym(.)
  return(varToBeTransformed)

}

This function returns the required variable name as a symbol. This can then be passed to the dplyr::mutate function. In order to do this dynamically for all dataframes present in the list, we pass the named list of dataframes to the purrr::imap function, through which both the list element (dataframe in your case) and the name of the list element can be accessed.

list.df.transformed <- purrr::imap(list.df, function(df, name){
   df %>% dplyr::mutate( long.col := !!getBaseVar(name, key))
})

Finally, the output you desire can be created by passing this list of transformed dataframes through purrr::imap once again, and extracting the necessary data (i.e. the long.col column and the variable with a repeating value of the name of the dataframe). After extraction, passing it to the dplyr::bind_rows function returns the desired data frame.

final.df <- tibble(data = character(), long.col = numeric()) 
purrr::imap(list.df.transformed, function(df, name){
    repeatedNameCol <- tibble(data = rep(name, nrow(df)))
    dataToBind <- df %>% dplyr::select(long.col) %>% 
                     dplyr::bind_cols(repeatedNameCol)
    return(dataToBind)
}) %>% dplyr::bind_rows(.) -> final.df

Hope this helps!