replace column values with vector in R stringr

Question

I'm trying to mutate and replace column values with vectors in stringr. I'm having some issues which I guess is related to how the function recycles. I'm new to R and can't seem to figure out exactly what I'm doing wrong.

The column I'd like to change:

[1] "3+4" "3+3"  NA    "3+4"  NA   "4+3" "4+4" "4+3" "4+4" "5+4" "4+3" "4+3" "3+4" "4+3"
[15] "4"   NA    "4+3" NA    NA    "3+4" "4+5" NA    "3+4" NA    NA    "3+4" NA    "3+4"
[29] "3+4" "3+4" "3+3" "3"   NA    "3+3" "3+3" NA    "4+5" NA    "3+3" "3+4" "4+4" "3+4"
[43] "4+4" "3+3" "3+4" "3+4" NA    "4+3" "4+3" "3+3" "3+3" "3+4"

I'd like to change this to 3+3 = 1, 3+4 = 2, 4+3 = 3, 4+4 = 4, 4+5 = 5, 5+5 = 5. These are Gleason scores and Gleason grade groups for prostate cancer.

Running one at times works just fine:

mrgb_trus <- mrgb_trus %>% 
mutate(MRGGG = str_replace_all(MRGB_gleason, "3\\+4", "2"))

Adding vectors:

mrgb_trus <- mrgb_trus %>% 
mutate(MRGGG = str_replace_all(MRGB_gleason, c("3\\+3", "3\\+4", "4\\+3", 
                                      "4\\+4", "4\\+5", "5\\+4", 
                                      "5\\+5"), c("1", "2", "3", 
                                      "4", "5", "5", "5")))

produces the warning

Warning message:
In stri_replace_first_regex(string, pattern,   fix_replacement(replacement),  :
longer object length is not a multiple of shorter object length

and does not return the desired output. What am I doing wrong? As you can see there are also some NAs and two values "3" and "4" that don't match the pattern. I'd also like to change the NAs to 0 and 3 and 4 to 1.

Do 4+5 and 5+5 both get the level 5 or did you typo there? — LAP
They're both 5. Context: pathology.jhu.edu/ProstateCancer/NewGradingSystem.pdf — stapperen

1.618 1.618 · Accepted Answer · 2018-07-02T11:42:50

One of the approach could be

#define your mapping here
lhs <- c('3+3', '3+4', '4+3', '4+4', '4+5', '5+5', '3', '4')
rhs <- c(1, 2, 3, 4, 5, 5, 1, 1)

df$col1_new <- ifelse(is.na(df$col1), 0, rhs[match(df$col1, lhs)])

which gives

> df$col1_new
 [1]  2  1  0  2  0  3  4  3  4 NA  3  3  2  3  1  0  3  0  0  2  5  0  2  0  0  2  0  2  2  2  1  1  0  1  1  0  5
[38]  0  1  2  4  2  4  1  2  2  0  3  3  1  1  2

Note that you are still missing definition for 5+4 in your sample data.

Sample data:

df <- structure(list(col1 = c("3+4", "3+3", NA, "3+4", NA, "4+3", "4+4", 
"4+3", "4+4", "5+4", "4+3", "4+3", "3+4", "4+3", "4", NA, "4+3", 
NA, NA, "3+4", "4+5", NA, "3+4", NA, NA, "3+4", NA, "3+4", "3+4", 
"3+4", "3+3", "3", NA, "3+3", "3+3", NA, "4+5", NA, "3+3", "3+4", 
"4+4", "3+4", "4+4", "3+3", "3+4", "3+4", NA, "4+3", "4+3", "3+3", 
"3+3", "3+4")), .Names = "col1", row.names = c(NA, -52L), class = "data.frame")

replace column values with vector in R stringr

2 Answers