Using dplyr's rename() including variable names not in data set

Question

I am trying to transition some plyr code to dplyr, and getting stuck with the new functionality of rename() in dplyr. I'd like to be able to reuse a single rename() expression for a set of datasets with overlapping but not identical original names. For example,

sample1 <- data.frame(A=1:10, B=letters[1:10])

sample2 <- data.frame(B=11:20, C=letters[11:20])

And then,

 rename(sample1, var1 = A, var2 = B, var3 = C)

I would like the result to be that variable A is renamed var1, and B is renamed var2, not adding a var3 in this case. Instead, I get

Error: Unknown variables: C.

In contrast, the plyr syntax would let me use

rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3"))
rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3"))

and not throw an error. Is there a way to get the same result in dplyr without getting the Unknown variables error?

You could reference the rename function specifically from plyr: plyr::rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3")) — Sam Firke

earino earino · Accepted Answer · 2015-02-25T02:04:33

Completely ignoring your actual request on how to do this with dplyr, I would like suggest a different approach using a lookup table:

sample1 <- data.frame(A=1:10, B=letters[1:10])
sample2 <- data.frame(B=11:20, C=letters[11:20])

rename_map <- c("A"="var1",
                "B"="var2",
                "C"="var3")

names(sample1) <- rename_map[names(sample1)]
str(sample1)

names(sample2) <- rename_map[names(sample2)]
str(sample2)

Fundamentally the algorithm is simple:

Build a lookup table of current variable names to desired names
Using the names() function, do a lookup into the map with the mapping indexes and assign those mapped variables to the appropriate columns.

EDIT: As per Hadley's suggestion, I used a named vector instead of a list, makes life much easier. I always forget about named vectors :(

Using dplyr's rename() including variable names not in data set

4 Answers