I have a data frame like as shown below
test_df <- data.frame("subject_id" = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
"date_1" = c("01/01/2003", "12/31/2007", "12/30/2008", "12/31/2005",
"01/01/2007", "01/01/2013", "12/31/2008", "03/04/2006",
"12/31/2009", "01/01/2015", "01/01/2009"))
What I would like to do is
Arrange the dates in ascending order for each subject (sort asc within groups)
Remove date records for each subject based on below criteria (year doesn't matter):
2a. remove only Dec 31st records if the first record of the subject is Jan 1st ex: subject_id = 1
2b. remove only Jan 1st records if the first record of the subject is Dec 31st ex: subject_id = 2
2c. remove only Dec 31st records if the subject has both Dec 31st and Jan 1st in their non-first records (meaning from 2nd record till the end of its records) ex:subject_id = 3
I was trying the below
sorted <- test_df %>% arrange(date_1,group_by = subject_id) #Am I right in sorts the dates within group?
test_df$month = month(test_df$date_1) #get the month
test_df$day = day(test_df$date_1) #get the year
filter(test_df, month==12 and day == 31) # doesn't work here
Can you help me with how can I filter out records based on my criteria?
I expect my output to be like as shown below
group_by=
withinarrange
supposed to be doing? Do you meanarrange(date_1) %>% group_by(subject_id)
? (And no, arranging is done regardless of group (trymtcars %>% group_by(cyl) %>% arrange(mpg) %>% print(n=99)
and see that there's an8
in the middle of6
s). – r2evansarrange(date_1) %>% group_by(subject_id)
– The Great