3
votes

Given a tibble or data.frame that has duplicate column names, I want to use dplyr::rename or dplyr::rename_with to either:
(a) differentiate the duplicate names with a serial numeric suffix ('a_1', 'a_2', etc) or
(b) rename each column entirely.

Have:

library(tidyverse)

d <- tibble(a = 1:3, a = letters[1:3], .name_repair = "minimal")

d
# A tibble: 3 x 2
      a a    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c  

Want:

tibble(a_1 = 1:3, a_2 = letters[1:3])

# A tibble: 3 x 2
    a_1 a_2           # or even just: x, y    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c   

It seems reasonable to expect that rename/rename_with could perform this operation, especially since colnames(d) <- c("a_1", "a_2") seems obvious and uncomplicated.

But I've gotten different errors and/or unexpected behavior with the three approaches I've tried so far:

1. Using rename(), one column is renamed, the other is not:

d %>% rename(x = "a", y = "a")  

# A tibble: 3 x 2
        y a    
    <int> <chr>
  1     1 a    
  2     2 b    
  3     3 c   

2. Using rename_with(), in which I use a function to add a numeric suffix, I get this error:

d %>% rename_with(~paste(.x, 1:2, sep = "_"))

Error: Names must be unique.
x These names are duplicated:
  * "a" at locations 1 and 2.

Note that this use of rename_with works as expected when data has no duplicate column names:

no_dupe <- tibble(a = 1:3, b = letters[1:3])
no_dupe %>% rename_with(~paste(.x, 1:2, sep = "_"))

# A tibble: 3 x 2
    a_1 b_2  
  <int> <chr>
1     1 a    
2     2 b    
3     3 c    

3. Using the older rename_all(), I get a different error:

d %>% rename_all(paste0, 1:2)

Error: Can't rename duplicate variables to `{name}`.

I found a discussion about handling duplicates with rename in the tidyselect GitHub issues, but that was about what to do if a user creates duplicate column names with rename(), not what to do if they are trying to unduplicate.

Am I missing syntax, or is the dplyr::rename family just not set up to ingest duplicate colnames?

(I'd also like to better understand why rename only renames one column in the example above, but that's less functional and more just me being curious.)

Thanks in advance.

2
rename does not know about the previous changes done so it renames the same column twice. This works d %>% rename(x = "a") %>% rename(y = "a")Ronak Shah
As far as why rename_with doesn't work with duplicate column names it is because of this line I think github.com/tidyverse/dplyr/blob/master/R/rename.R#L70 . We are not allowed to pass our own repair argument and it is always "check_unique"Ronak Shah
Good find, @RonakShah, thanks for digging that up. It's rename_with() I mainly care about, as I was hoping to find a programmatic way to assign suffixes to a number of duplicate colnames. Would you mind writing that up as an answer? Good to know about rename() not having "memory, too.andrew_reece
Why not store all col names in a vector. Make another vector having 1:n of equal length. Concatenate both vectors and replace colnames with result vectorAnilGoyal

2 Answers

1
votes

rename does not know about the previous change in name done. For example,

library(dplyr)
mtcars %>% rename(a = mpg, b = a)

Error: Can't rename columns that don't exist. x Column a doesn't exist.

So this

d %>% rename(x = "a", y = "a")  

renames the same a column twice, first with x and then with y. An alternative is to break the pipe and rename.

d %>% rename(x = "a") %>% rename(y = "a")  

# A tibble: 3 x 2
#      x y    
#  <int> <chr>
#1     1 a    
#2     2 b    
#3     3 c 

rename_with does not allow to rename dataframe/tibbles with duplicate columns because of https://github.com/tidyverse/dplyr/blob/master/R/rename.R#L70 . In such case I think your best bet is to use base R methods.

0
votes

Assuming that you have multiple columns with duplicate or even triplicate column names you can also try this method..

Fetch all column names in a vector named v using colnames. thereafter, follow this method..

v <- c("a", "a", "b", "c", "c", "c", "d", "e")

df <- tibble(v)


df <- df %>% mutate(id = 1) %>% 
  group_by(v) %>% 
  summarise(id = cumsum(id)) %>% 
  mutate(v2 = paste(v,id, sep = "_")) %>%
  mutate(v2 = ifelse(id==1, v, v2)) 

Last replace column names with df$v2

> df$v2
[1] "a"   "a_2" "b"   "c"   "c_2" "c_3" "d"   "e"