0
votes

Given a large data frame with a column that has unique values

(ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT)

I want to replace some of the values. For example, every occurrence of 'ONE' should be replaced by '1' and

'FOUR' -> '2SQUARED'
'FIVE' -> '5'
'EIGHT' -> '2CUBED'

Other values should remain as they are.

IF/ELSE will run forever. How to apply a vectorized solution? Is match() the corrct way to go?

3
Check mapvalues in plyr packageDavid Arenburg
I think this is the best solution so far since I said it is a large data frame. I have many subs to do so creating 2 vectors replace = c(...) and with = c(...) to use in mapvalues() is easiestMax Wen

3 Answers

0
votes

Using @rnso data set

library(plyr)
transform(data, vals = mapvalues(vals, 
          c('ONE', 'FOUR', 'FIVE', 'EIGHT'),
          c('1','2SQUARED', '5', '2CUBED'))) 
#       vals
# 1        1
# 2      TWO
# 3    THREE
# 4 2SQUARED
# 5        5
# 6      SIX
# 7    SEVEN
# 8   2CUBED
0
votes

Try following using base R:

data = structure(list(vals = structure(c(4L, 8L, 7L, 3L, 2L, 6L, 5L, 
1L), .Label = c("EIGHT", "FIVE", "FOUR", "ONE", "SEVEN", "SIX", 
"THREE", "TWO"), class = "factor")), .Names = "vals", class = "data.frame", row.names = c(NA, 
-8L))

initial = c('ONE', 'FOUR', 'FIVE', 'EIGHT')
final = c('1','2SQUARED', '5', '2CUBED')

myfn = function(ddf, init, fin){
    refdf = data.frame(init,fin)
    ddf$new = refdf[match(ddf$vals, init), 'fin']
    ddf$new = as.character(ddf$new)
    ndx = which(is.na(ddf$new))
    ddf$new[ndx]= as.character(ddf$vals[ndx])
    ddf
}

myfn(data, initial, final)

   vals      new
1   ONE        1
2   TWO      TWO
3 THREE    THREE
4  FOUR 2SQUARED
5  FIVE        5
6   SIX      SIX
7 SEVEN    SEVEN
8 EIGHT   2CUBED
> 
0
votes

Your column is probably a factor. Give this a try. Using rnso's data, I'd recommend you first create two vectors of values to change from and values to change to

from <- c("FOUR", "FIVE", "EIGHT")
to <- c("2SQUARED", "5", "2CUBED")

Then replace the factors with

with(data, levels(vals)[match(from, levels(vals))] <- to)

This gives

data
#       vals
# 1      ONE
# 2      TWO
# 3    THREE
# 4 2SQUARED
# 5        5
# 6      SIX
# 7    SEVEN
# 8   2CUBED