7
votes

I am trying to transition some plyr code to dplyr, and getting stuck with the new functionality of rename() in dplyr. I'd like to be able to reuse a single rename() expression for a set of datasets with overlapping but not identical original names. For example,

sample1 <- data.frame(A=1:10, B=letters[1:10])

sample2 <- data.frame(B=11:20, C=letters[11:20])

And then,

 rename(sample1, var1 = A, var2 = B, var3 = C)

I would like the result to be that variable A is renamed var1, and B is renamed var2, not adding a var3 in this case. Instead, I get

Error: Unknown variables: C.

In contrast, the plyr syntax would let me use

rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3"))
rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3"))

and not throw an error. Is there a way to get the same result in dplyr without getting the Unknown variables error?

4
You could reference the rename function specifically from plyr: plyr::rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3")) - Sam Firke

4 Answers

4
votes

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table:

sample1 <- data.frame(A=1:10, B=letters[1:10])
sample2 <- data.frame(B=11:20, C=letters[11:20])

rename_map <- c("A"="var1",
                "B"="var2",
                "C"="var3")

names(sample1) <- rename_map[names(sample1)]
str(sample1)

names(sample2) <- rename_map[names(sample2)]
str(sample2)

Fundamentally the algorithm is simple:

  1. Build a lookup table of current variable names to desired names
  2. Using the names() function, do a lookup into the map with the mapping indexes and assign those mapped variables to the appropriate columns.

EDIT: As per Hadley's suggestion, I used a named vector instead of a list, makes life much easier. I always forget about named vectors :(

1
votes
    #no need to use rename 

    oldnames<-unique(c(names(sample1),names(sample2)))
    newnames<-c("var1","var2","var3")
    name_df<-data.frame(oldnames,newnames)
    mydata<-list(sample1,sample2) # combined two datasets as a list
#one liner
    finaldata <- lapply(mydata, function(i) {colnames(i)<-name_df[name_df[,1] %in%  colnames(i),2]
return(i)})
> finaldata
[[1]]
   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

[[2]]
   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t
1
votes

With dplyr, we can use a named vector with old names as values and new names as names, then unquote only the values in name_vec that matches names in your dataset. rename supports unquoting characters, so there is no need to convert them to sym beforehand:

library(dplyr)

name_vec <- c(var1 = "A", var2 = "B", var3 = "C")

sample1 %>%
  rename(!!name_vec[name_vec %in% names(.)])

sample2 %>%
  rename(!!name_vec[name_vec %in% names(.)])

Also, with setNames:

name_vec <- c(A = "var1", B = "var2", C = "var3")

sample1 %>%
  setNames(name_vec[names(.)])

sample2 %>%
  setNames(name_vec[names(.)])

Output:

   var1 var2
1     1    a
2     2    b
3     3    c
4     4    d
5     5    e
6     6    f
7     7    g
8     8    h
9     9    i
10   10    j

   var2 var3
1    11    k
2    12    l
3    13    m
4    14    n
5    15    o
6    16    p
7    17    q
8    18    r
9    19    s
10   20    t
1
votes

I’ve used @earino’s answer before myself, but discovered that it can be unsafe. If column names of the data frame are missing in the (names of the) named vector, those column names are silently replaced with NA and that is certainly not what you want.

d1 <- data.frame(A = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

rename_vec <- c("B" = "var2", "C" = "var3")

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

The same can happen, if you run names(d1) <- rename_vec[names(d1)] twice by accident, because when you run it the second time, none of the colnames(d1) are in names(rename_vec).

names(d1) <- rename_vec[names(d1)]
str(d1)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ NA: int  1 2 3 4 5 6 7 8 9 10
#>  $ NA: chr  "a" "b" "c" "d" ...

We just need to select those columns that are in the data frame and in the rename vector.

d2 <- data.frame(B1 = 1:10, B = letters[1:10], stringsAsFactors = FALSE)

sel <- is.element(colnames(d2), names(rename_vec))
names(d2)[sel] <- rename_vec[names(d2)][sel]
str(d2)
#> 'data.frame':    10 obs. of  2 variables:
#>  $ B1  : int  1 2 3 4 5 6 7 8 9 10
#>  $ var2: chr  "a" "b" "c" "d" ...

UPDATE: I initially had a solution here that involved string replacement, which turned out to be unsafe as well, because it allowed for partial matching. This one is better, I think.