0
votes

I am cleaning my data in a tibble format with tidyverse. I want to get rid of prefix in column names which were assigned by system, eg:

x <- c("XYZ.date", "XYZ.region", "XYZ.region.europe.western"). 

There are not many distinct prefixes, BUT they are not always the same length.I know I can rename them one by one with rename function, but is there a way to get rid of them all at once?

It is not a problem to do a list of them.

4
hi,what i understand is you want to rename or remove pattern from column, then use gsub will helpTushar Lad
Would you have a situation where you have "XYZ_date" and "JWNDI_date"? in other words, are there situations where the column name would be the same following prefix removal? Lastly, are all the prefixes separated by an "_"?AndS.
Possible duplicate, related post: stackoverflow.com/q/12297859/680068zx8754
If the prefix ("XYZ") is fixed then simply substring(x, 5) should do.zx8754

4 Answers

2
votes
foo <- function(x) gsub("^[^_]*_", "", x)

df %>%
    rename_all(foo)

So, here I write a function that says remove everything from a string up to the underscore ("_") and then apply this function to all the names.

1
votes

This replaces each occurrence of all characters up to and including underscore with the empty string and then makes the result unique. We can optionally remove %>% make.unique in the first case or %>% as_tibble(.name_repair = "unique") in the second case if it is known that the names will be unique anyways.

library(dplyr)
DF %>% rename_all(~ sub(".*_", "", .x) %>% make.unique)

or this which disambiguates the names in a slightly different manner.

library(dplyr)
library(tibble)
DF %>% rename_all(~ sub(".*_", "", .x)) %>% as_tibble(.name_repair = "unique")

Example

For example, using the first case above add prefixes to each name of the built in anscombe creating DF and apply the above to that in the last line of code below.

# set up a test data frame using builtin anscombe
DF <- setNames(anscombe, sub("(.)(.)", "\\1_\\2", names(anscombe)))
names(DF)
## [1] "x_1" "x_2" "x_3" "x_4" "y_1" "y_2" "y_3" "y_4"

DF %>% rename_all(~ sub(".*_", "", .x) %>% make.unique)
##     1  2  3  4   1.1  2.1   3.1   4.1
## 1  10 10 10  8  8.04 9.14  7.46  6.58
## 2   8  8  8  8  6.95 8.14  6.77  5.76
## ...etc...
0
votes

In base R, replacing a 3 character alphanumeric prefix followed by an underscore:

colnames(df) <- gsub("^[0-9A-Za-z]{3}_", "", colnames(df))
0
votes

You can use the following code to do it

df <- cbind(1,1:4)
colnames(df) <- c("x","Y")
colnames(df) <- paste("Sub", colnames(df), sep = "_")
df
#>      Sub_x Sub_Y
#> [1,]     1     1
#> [2,]     1     2
#> [3,]     1     3
#> [4,]     1     4
colnames(df)<-sub("^[^_]*_","",colnames(df))
df
#>      x Y
#> [1,] 1 1
#> [2,] 1 2
#> [3,] 1 3
#> [4,] 1 4

Created on 2020-01-29 by the reprex package (v0.3.0)