0
votes

The problem I have is this. I have a data frame containing four columns, named CSES. I have a separate data frame named meta, containing two columns, code and name. If a value in any of the CSES columns match meta$code I want to replace it with the value in meta$name.

Currently my solution uses for loops and is fine; but is there a better way to do this that avoids the loops, preferably using the Tidyverse? Seems like a pretty standard question that should have a answer already but I didn't find any. I am thankful for any help. Cheers.

for(i in seq_along(meta$code)) {
  code <- meta$code[[i]]
  for(k in seq_along(CSES)) {
    col <- CSES[[k]]
    col[col == code] <- meta$name[[i]]
    CSES[[k]] <- col
  }
}

EDIT: I added input data and desired output below and clarification of desired output per request.

To clarify, it is desired that if the meta$code column matches a value in CSES the value in CSES should be changed to the meta$name value on the same row as the matching meta$code.

CSES, this is the data I want to recode:

# A tibble: 274,719 x 6
#   IMD5000_A IMD5000_B IMD5000_C IMD5000_D IMD5000_E IMD5000_F
#   <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
# 1 320001    320002    320003    320004    320005    320006   
# 2 320001    320002    320003    320004    320005    320006   
# 3 320001    320002    320003    320004    320005    320006   
# 4 320001    320002    320003    320004    320005    320006   
# 5 320001    320002    320003    320004    320005    320006   
# 6 320001    320002    320003    320004    320005    320006   
# 7 320001    320002    320003    320004    320005    320006   
# 8 320001    320002    320003    320004    320005    320006   
# 9 320001    320002    320003    320004    320005    320006   
# 10 320001    320002    320003    320004    320005    320006   
# … with 274,709 more rows

Meta:

# A tibble: 44 x 2
#   code    name 
#   <chr>   <chr>
# 1 7520001 SAP  
# 2 7520002 M    
# 3 7520003 FP   
# 4 7520004 KD   
# 5 7520005 MP   
# 6 7520006 C    
# 7 7520007 V    
# 8 7520008 SD   
# 9 7520009 Fi   
#10 7520010 Jl   
# … with 34 more rows

For example the value 7520001 in CSES should be changed to "SAP".

2
Please share your input data as something other than an image...Peter H.
@PeterH. OK. Seemed like the simplest way to do it, but I will keep it in mind for next time. I received great answers below so I will leave it as it is in this question :)Filip Sjöstrand
@PeterH. Ah, I realized I could simply paste the data and it would format it correctly (I did not know that) so I updated the question as requested.Filip Sjöstrand
Actually, it would be even better if you used the output of dput such that people can directly import the test data into their own R session :). When @IceCreamToucan came up with his answer he had to generate his own test data, it would've been better if you had supplied him some up front. That way you're more likely to get help and get it faster.Peter H.
@PeterH. Oh, I see, another new thing for me, seems very useful. For anyone else that may read this here are some instructions on using dput() stackoverflow.com/questions/49994249/example-of-using-dputFilip Sjöstrand

2 Answers

0
votes

Example data:

set.seed(10)
CSES <- as.data.frame(matrix(sample(letters[1:10], 40, T), 10),
                      stringsAsFactors = F)
meta <- data.frame(code = letters[1:3], name = c('peach', 'pear', 'apple'),
                   stringsAsFactors = F)

CSES
#    V1 V2 V3 V4
# 1   f  g  i  f
# 2   d  f  g  a
# 3   e  b  h  b
# 4   g  f  d  i
# 5   a  d  e  e
# 6   c  e  h  h
# 7   c  a  i  i
# 8   c  c  c  j
# 9   g  d  h  g
# 10  e  i  d  f
meta
#   code  name
# 1    a peach
# 2    b  pear
# 3    c apple

Now update each column of CSES with the name in meta corresponding to the given code

base R

repl <- matrix(meta$name[match(as.matrix(CSES), meta$code)], nrow(CSES))

CSES[!is.na(repl)] <- repl[!is.na(repl)]

CSES
#       V1    V2    V3    V4
# 1      f     g     i     f
# 2      d     f     g peach
# 3      e  pear     h  pear
# 4      g     f     d     i
# 5  peach     d     e     e
# 6  apple     e     h     h
# 7  apple peach     i     i
# 8  apple apple apple     j
# 9      g     d     h     g
# 10     e     i     d     f

dplyr

library(dplyr)

CSES %>% 
  mutate_all(~ coalesce(meta$name[match(.x, meta$code)], .x))
#       V1    V2    V3    V4
# 1      f     g     i     f
# 2      d     f     g peach
# 3      e  pear     h  pear
# 4      g     f     d     i
# 5  peach     d     e     e
# 6  apple     e     h     h
# 7  apple peach     i     i
# 8  apple apple apple     j
# 9      g     d     h     g
# 10     e     i     d     f

data.table

library(data.table)
setDT(CSES)
setDT(meta)

for(n in names(CSES))
  CSES[meta, on = setNames('code', n), (n) := i.name]

CSES[]
#       V1    V2    V3    V4
#  1:     f     g     i     f
#  2:     d     f     g peach
#  3:     e  pear     h  pear
#  4:     g     f     d     i
#  5: peach     d     e     e
#  6: apple     e     h     h
#  7: apple peach     i     i
#  8: apple apple apple     j
#  9:     g     d     h     g
# 10:     e     i     d     f
0
votes

Or in base R you could do this with sapply or, perhaps more efficiently, by resetting levels if your data frame is converted to factors.

meta = data.frame(code=c('a','c'), name=c('A', 'C'), stringsAsFactors = F)
CSES = data.frame(code = c('a', 'b', 'c', 'd'), c2 = c(1,2,3,4), stringsAsFactors = F)
print("---before----")
print(CSES)
f <- CSES$code %in% meta$code
CSES$code[f] <- sapply(CSES$code[f], function(x) meta$name[which(x==meta$code)])
print("---after using factors and levels----")
print(CSES)

With output [1] "---b

efore----"
  code c2
1    a  1
2    b  2
3    c  3
4    d  4
[1] "---after using factors and levels----"
  code c2
1    A  1
2    b  2
3    C  3
4    d  4

Using Factors looks like this

meta2 = data.frame(code=c('a','c'), name=c('A', 'C'))
CSES2 = data.frame(code = c('a', 'b', 'c', 'd'), c2 = c(1,2,3,4))
frepl <- function(s){
  if (s %in% meta2$code){levels(meta2$name)[meta2$code == s]} else {s}
}
levels(CSES2$code) <- sapply(levels(CSES2$code), function(s) frepl(s))
print("---after using factors and levels----")
print(CSES2)

with output

[1] "---after using factors and levels----"
  code c2
1    A  1
2    b  2
3    C  3
4    d  4