Given a tibble or data.frame that has duplicate column names, I want to use dplyr::rename
or dplyr::rename_with
to either:
(a) differentiate the duplicate names with a serial numeric suffix ('a_1'
, 'a_2'
, etc) or
(b) rename each column entirely.
Have:
library(tidyverse)
d <- tibble(a = 1:3, a = letters[1:3], .name_repair = "minimal")
d
# A tibble: 3 x 2
a a
<int> <chr>
1 1 a
2 2 b
3 3 c
Want:
tibble(a_1 = 1:3, a_2 = letters[1:3])
# A tibble: 3 x 2
a_1 a_2 # or even just: x, y
<int> <chr>
1 1 a
2 2 b
3 3 c
It seems reasonable to expect that rename
/rename_with
could perform this operation, especially since colnames(d) <- c("a_1", "a_2")
seems obvious and uncomplicated.
But I've gotten different errors and/or unexpected behavior with the three approaches I've tried so far:
1. Using rename()
, one column is renamed, the other is not:
d %>% rename(x = "a", y = "a")
# A tibble: 3 x 2
y a
<int> <chr>
1 1 a
2 2 b
3 3 c
2. Using rename_with()
, in which I use a function to add a numeric suffix, I get this error:
d %>% rename_with(~paste(.x, 1:2, sep = "_"))
Error: Names must be unique.
x These names are duplicated:
* "a" at locations 1 and 2.
Note that this use of rename_with
works as expected when data has no duplicate column names:
no_dupe <- tibble(a = 1:3, b = letters[1:3])
no_dupe %>% rename_with(~paste(.x, 1:2, sep = "_"))
# A tibble: 3 x 2
a_1 b_2
<int> <chr>
1 1 a
2 2 b
3 3 c
3. Using the older rename_all()
, I get a different error:
d %>% rename_all(paste0, 1:2)
Error: Can't rename duplicate variables to `{name}`.
I found a discussion about handling duplicates with rename
in the tidyselect
GitHub issues, but that was about what to do if a user creates duplicate column names with rename()
, not what to do if they are trying to unduplicate.
Am I missing syntax, or is the dplyr::rename
family just not set up to ingest duplicate colnames?
(I'd also like to better understand why rename
only renames one column in the example above, but that's less functional and more just me being curious.)
Thanks in advance.
rename
does not know about the previous changes done so it renames the same column twice. This worksd %>% rename(x = "a") %>% rename(y = "a")
– Ronak Shahrename_with
doesn't work with duplicate column names it is because of this line I think github.com/tidyverse/dplyr/blob/master/R/rename.R#L70 . We are not allowed to pass our ownrepair
argument and it is always"check_unique"
– Ronak Shahrename_with()
I mainly care about, as I was hoping to find a programmatic way to assign suffixes to a number of duplicate colnames. Would you mind writing that up as an answer? Good to know aboutrename()
not having "memory, too. – andrew_reece