This question arose by working on this The R dplyr function arrange(ymd(col)) is not working
We have this data frame:
df <- structure(list(record_id = 1:5, group = c("A", "B", "C", "D",
"E"), date_start = c("Apr-22", "Aug-21", "Jan-22", "Feb-22",
"Dec-21")), class = "data.frame", row.names = c(NA, -5L))
record_id group date_start
1 1 A Apr-22
2 2 B Aug-21
3 3 C Jan-22
4 4 D Feb-22
5 5 E Dec-21
We would like to sort date_start:
My first approach: worked
library(dplyr)
library(lubridate)
df %>%
mutate(date_start1 = myd(paste0(date_start,"-01"))) %>%
arrange(date_start1) %>%
select(-date_start1)
record_id group date_start
1 2 B Aug-21
2 5 E Dec-21
3 3 C Jan-22
4 4 D Feb-22
5 1 A Apr-22
Then I tried this and it also worked
library(dplyr)
library(lubridate)
df %>%
arrange(date_start1 = myd(paste0(date_start,"-01")))
record_id group date_start
1 2 B Aug-21
2 5 E Dec-21
3 3 C Jan-22
4 4 D Feb-22
5 1 A Apr-22
I would like to understand how one arrange
can do the same as a combination of mutate
, arrange
and select
arrange
doesn't create a new column,date_start1
in the dataset ie.... - <data-masking> Variables, or functions of variables.
- akrunarrange
is invisibly creating a temporarydate_start1
to sort off of and then removing it. Can't find that documented anywhere. - Dan Adamsdplyr:::arrange_rows
and if you check it is doing a loop withmap2
(transmute
is also used) - akrunarrange(myd(paste0(date_start,"-01")))
. I wouldn't use it though - fewer keystrokes but makes the code less clear. - SamR