68
votes

of course I could replace specific arguments like this:

    mydata=c("á","é","ó")
    mydata=gsub("á","a",mydata)
    mydata=gsub("é","e",mydata)
    mydata=gsub("ó","o",mydata)
    mydata

but surely there is a easier way to do this all in onle line, right? I dont find the gsub help to be very comprehensive on this.

11
If you wanted to replace different patterns with the same thing, it should be possible with lapply, but as you want to replace different patterns with different strings, I think you will still have to specified these one way or another... - juba
You might be able to use chartr to do this. - Andrie
The gsubfn function in the gsubfn package is a generalization of gsub that can do that in one call: gsubfn(".", list("á"="a", "é"="e", "ó"="o"), c("á","é","ó")) - G. Grothendieck
@G.Grothendieck. Thats great and also working for all type of characters. Very valuable comment. Thank you! - Joschi
For people searching for a more general solution to this question, here is a more helpful answer: stackoverflow.com/a/7664655/1036500 - Ben

11 Answers

83
votes

Use the character translation function

chartr("áéó", "aeo", mydata)
33
votes

An interesting question! I think the simplest option is to devise a special function, something like a "multi" gsub():

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

Which gives me:

> mydata <- c("á","é","ó")
> mgsub(c("á","é","ó"), c("a","e","o"), mydata)
[1] "a" "e" "o"
26
votes

Maybe this can be usefull:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"
12
votes

You can use stringi package to replace these characters.

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"
9
votes

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

7
votes

Another mgsub implementation using Reduce

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}
7
votes

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)
3
votes

Not so elegant, but it works and does what you want

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"
2
votes

Related to Justin's answer:

> m <- c("á"="a", "é"="e", "ó"="o")
> m[mydata]
  á   é   ó 
"a" "e" "o" 

And you can get rid of the names with names(*) <- NULL if you want.

1
votes

You can use the match function. Here match(x, y) returns the index of y where the element of x is matched. Then you can use the returned indices, to subset another vector (say z) that contains the replacements for the values of x, appropriately matched with y. In your case:

mydata <- c("á","é","ó")
desired <- c('a', 'e', 'o')

desired[match(mydata, mydata)]

In a simpler example, consider the situation below, where I was trying to substitute a for 'alpha', 'b' for 'beta' and so forth.

x <- c('a', 'a', 'b', 'c', 'b', 'c', 'e', 'e', 'd')

y <- c('a', 'b', 'c', 'd', 'e')
z <- c('alpha', 'beta', 'gamma', 'delta', 'epsilon')

z[match(x, y)]
0
votes

In this case, doesn't have so much sense, but if they are just two, you can also combine them with gsub:

mydata <- gsub("á","a", gsub("é","e",mydata))