43
votes

I want to rename some random columns of a large data frame and I want to use the current column names, not the indexes. Column indexes might change if I add or remove columns to the data, so I figure using the existing column names is a more stable solution. This is what I have now:

mydf = merge(df.1, df.2)
colnames(mydf)[which(colnames(mydf) == "MyName.1")] = "MyNewName"

Can I simplify this code, either the original merge() call or just the second line? "MyName.1" is actually the result of an xts merge of two different xts objects.

5
Can you provide a small reproducible data set with the desired output?Dason
You don't need the which there! R accepts boolean in the operator []. colnames(mydf)[colnames(mydf)=="MyName.1"] = "MyNewName" should work!João Daniel
names(mydf)[names(mydf) == "MyName.1"] = "MyNewName" ... about 13 or so characters shorter. Although, you may want to replace a vector in that case, use %in% instead of ==.Brandon Bertelsen
@BrandonBertelsen, could you repost your comment as an answer? In that way the community can see that the question is being addressed, and you'll get some rep for it.Paul Hiemstra
I don't think this should not be tagged data.table, because this is about data frames (even though data tables are mentioned below). The right way to rename columns in data table is by using setnamesgeneorama

5 Answers

154
votes

The trouble with changing column names of a data.frame is that, almost unbelievably, the entire data.frame is copied. Even when it's in .GlobalEnv and no other variable points to it.

The data.table package has a setnames() function which changes column names by reference without copying the whole dataset. data.table is different in that it doesn't copy-on-write, which can be very important for large datasets. (You did say your data set was large.). Simply provide the old and the new names:

require(data.table)
setnames(DT,"MyName.1", "MyNewName")
# or more explicit:
setnames(DT, old = "MyName.1", new = "MyNewName")
?setnames
27
votes
names(mydf)[names(mydf) == "MyName.1"] = "MyNewName" # 13 characters shorter. 

Although, you may want to replace a vector eventually. In that case, use %in% instead of == and set MyName.1 as a vector of equal length to MyNewName

26
votes

plyr has a rename function for just this purpose:

library(plyr)
mydf <- rename(mydf, c("MyName.1" = "MyNewName"))
4
votes
names(mydf) <- sub("MyName\\.1", "MyNewName", names(mydf))

This would generalize better to a multiple-name-change strategy if you put a stem as a pattern to be replaced using gsub instead of sub.

1
votes

You can use the str_replace function of the stringr package:

names(mydf) <- str_replace(names(mydf), "MyName.1", "MyNewName")