10
votes

My dataframe which I read from a csv file has column names like this

abc.def, ewf.asd.fkl, qqit.vsf.addw.coil

I want to remove the '.' from all the names and convert them to

abcdef, eqfasdfkl, qqitvsfaddwcoil.

I tried using the sub command sub(".","",colnames(dataframe)) but this command took out the first letter of each column name and the column names changed to

bc.def, wf.asd.fkl, qit.vsf.addw.coil

Anyone know another command to do this. I can change the column name one by one, but I have a lot of files with 30 or more columns in each file.

Again, I want to remove the "." from all the colnames. I am trying to do this so I can use "sqldf" commands, which don't deal well with "."

Thank you for your help

3

3 Answers

20
votes

1) sqldf can deal with names having dots in them if you quote the names:

library(sqldf)
d0 <- read.csv(text = "A.B,C.D\n1,2")
sqldf('select "A.B", "C.D" from d0')

giving:

  A.B C.D
1   1   2

2) When reading the data using read.table or read.csv use the check.names=FALSE argument.

Compare:

Lines <- "A B,C D
1,2
3,4"
read.csv(text = Lines)
##   A.B C.D
## 1   1   2
## 2   3   4
read.csv(text = Lines, check.names = FALSE)
##   A B C D
## 1   1   2
## 2   3   4

however, in this example it still leaves a name that would have to be quoted in sqldf since the names have embedded spaces.

3) To simply remove the periods, if DF is a data frame:

names(DF) <- gsub(".", "", names(DF), fixed = TRUE)

or it might be nicer to convert the periods to underscores so that it is reversible:

names(DF) <- gsub(".", "_", names(DF), fixed = TRUE)

This last line could be alternatively done like this:

names(DF) <- chartr(".", "_", names(DF))
6
votes

To replace all the dots in the names you'll need to use gsub, rather than sub, which will only replace the first occurrence.

This should work.

test <- data.frame(abc.def = NA, ewf.asd.fkl = NA, qqit.vsf.addw.coil = NA)
names(test) <- gsub( ".",  "", names(test), fixed = TRUE)
test
  abcdef ewfasdfkl qqitvsfaddwcoil
1     NA        NA              NA
5
votes

UPDATE dplyr 0.8.0

As of dplyr 0.8 funs() is soft deprecated, use formula notation.

a dplyr way to do this using stringr.

library(dplyr)
library(stringr)

data <- data.frame(abc.def = 1, ewf.asd.fkl = 2, qqit.vsf.addw.coil = 3)
renamed_data <- data %>%
  rename_all(~str_replace_all(.,"\\.","_")) # note we have to escape the '.' character with \\

Make sure you install the packages with install.packages().

Remember you have to escape the . character with \\. in regex, which functions like str_replace_all use, . is a wildcard.