This questions is related to the following questions posted by other users:
How to number/label data-table by group-number from group_by?
Parts of my approach are 'inspired' by krlmlr answers to this thread: https://github.com/tidyverse/dplyr/issues/1185
Problem:
I have a data.frame
similar to the following:
db <- data.frame(ID = c(rep(1, 5), rep(2, 5)),
date = as.Date(c(
rep("2001-01-01", 3),
"2001-01-03",
"2001-01-03",
rep("2011-01-01", 2),
rep("2010-03-12", 2),
"2001-01-01"
)))
db
# ID date
# 1 1 2001-01-01
# 2 1 2001-01-01
# 3 1 2001-01-01
# 4 1 2001-01-03
# 5 1 2001-01-03
# 6 2 2011-01-01
# 7 2 2011-01-01
# 8 2 2010-03-12
# 9 2 2010-03-12
# 10 2 2001-01-01
My goal is to group by ID and date and create a new column in db, indicating an order of the dates within each ID. The solution would be a new column to db with the values c(1, 1, 1, 2, 2, 3, 3, 2, 2, 1)
The two approaches I tried will rank the dates across all IDs but not within each ID (see below).
What can I do?
Thank you very much.
Own approaches
db %>%
group_by(ID, date) %>%
{ mutate(ungroup(.), rank = group_indices(.)) }
## A tibble: 10 x 3
# ID date rank
# <dbl> <date> <int>
# 1 1 2001-01-01 1
# 2 1 2001-01-01 1
# 3 1 2001-01-01 1
# 4 1 2001-01-03 2
# 5 1 2001-01-03 2
# 6 2 2011-01-01 5
# 7 2 2011-01-01 5
# 8 2 2010-03-12 4
# 9 2 2010-03-12 4
#10 2 2001-01-01 3
db %>%
mutate(label = group_indices(., ID, date))
# ID date label
#1 1 2001-01-01 1
#2 1 2001-01-01 1
#3 1 2001-01-01 1
#4 1 2001-01-03 2
#5 1 2001-01-03 2
#6 2 2011-01-01 5
#7 2 2011-01-01 5
#8 2 2010-03-12 4
#9 2 2010-03-12 4
#10 2 2001-01-01 3