0
votes

I've got one table that is a set of all of my columns, their possible corresponding values, and the description for each one of those values. For example, the table looks like this:

ID Column Value Description
1  Age    A     Age_20-30
2  Age    B     Age_30-50
3  Age    C     Age_50-75
4  Geo    A     Big_City
5  Geo    B     Suburbs
6  Geo    C     Rural_Town

And so on.. Next, I have my main data frame that is populated with the column values. What I'd like to do is switch all values in each column with their corresponding description.

Old:

ID Age Geo
1  A   B
2  A   A
3  C   A
4  B   C
5  C   C

New:

ID Age       Geo
1  Age_20-30 Suburbs
2  Age_20-30 Big_City
3  Age_50-75 Big_City
4  Age_30-50 Rural_Town
5  Age_50-75 Rural_Town

Now I know how I can do this for one column using the following (where lookup_df is a table for only one of my columns):

old <- lookup_df$Value
new <- lookup_df$Description
df$Age <- new[match(df$Age, old, nomatch = 0)]

But I am struggling to do this for all columns. My full set of data has >100 columns so doing this manually for each column isn't really an option (at least in terms of efficiency). Any help or pointers in the right direction would be a huge help.

2

2 Answers

0
votes

We can split the first dataset into to a list of named vectors. Use that to match and replace the second dataset

lst1 <- lapply(split(df1[c('Value', 'Description')], df1$Column), 
          function(x) setNames(x$Description, x$Value))
df2[-1] <- Map(function(x, y) y[x], df2[-1], lst1)

-output

df2
# ID       Age        Geo
#1  1 Age_20-30    Suburbs
#2  2 Age_20-30   Big_City
#3  3 Age_50-75   Big_City
#4  4 Age_30-50 Rural_Town
#5  5 Age_50-75 Rural_Town

data

df1 <- structure(list(ID = 1:6, Column = c("Age", "Age", "Age", "Geo", 
"Geo", "Geo"), Value = c("A", "B", "C", "A", "B", "C"), 
Description = c("Age_20-30", 
"Age_30-50", "Age_50-75", "Big_City", "Suburbs", "Rural_Town"
)), class = "data.frame", row.names = c(NA, -6L))

df2 <- structure(list(ID = 1:5, Age = c("A", "A", "C", "B", "C"), Geo = c("B", 
"A", "A", "C", "C")), class = "data.frame", row.names = c(NA, 
-5L))
0
votes

To do this on data with lot of columns you can get the data in long format, join it with the first dataframe and (if needed) get it back in wide format.

library(dplyr)
library(tidyr)

df2 %>%
  pivot_longer(cols = -ID) %>%
  left_join(df1 %>% select(-ID), 
             by = c('name' = 'Column', 'value' = 'Value')) %>%
  select(-value) %>%
  pivot_wider(names_from = name, values_from = Description)

#     ID Age       Geo       
#  <int> <chr>     <chr>     
#1     1 Age_20-30 Suburbs   
#2     2 Age_20-30 Big_City  
#3     3 Age_50-75 Big_City  
#4     4 Age_30-50 Rural_Town
#5     5 Age_50-75 Rural_Town