0
votes

This questions is related to the following questions posted by other users:

How to number/label data-table by group-number from group_by?

Numbering of groups in dplyr?

Parts of my approach are 'inspired' by krlmlr answers to this thread: https://github.com/tidyverse/dplyr/issues/1185


Problem:

I have a data.frame similar to the following:

db <- data.frame(ID = c(rep(1, 5), rep(2, 5)),
                  date = as.Date(c(
                    rep("2001-01-01", 3),
                    "2001-01-03",
                    "2001-01-03",
                    rep("2011-01-01", 2),
                    rep("2010-03-12", 2),
                    "2001-01-01"
                  )))

db
#       ID       date
#    1   1 2001-01-01
#    2   1 2001-01-01
#    3   1 2001-01-01
#    4   1 2001-01-03
#    5   1 2001-01-03
#    6   2 2011-01-01
#    7   2 2011-01-01
#    8   2 2010-03-12
#    9   2 2010-03-12
#    10  2 2001-01-01

My goal is to group by ID and date and create a new column in db, indicating an order of the dates within each ID. The solution would be a new column to db with the values c(1, 1, 1, 2, 2, 3, 3, 2, 2, 1)

The two approaches I tried will rank the dates across all IDs but not within each ID (see below).

What can I do?

Thank you very much.


Own approaches

db %>% 
   group_by(ID, date) %>% 
   { mutate(ungroup(.), rank = group_indices(.)) }
## A tibble: 10 x 3
#      ID       date  rank
#   <dbl>     <date> <int>
# 1     1 2001-01-01     1
# 2     1 2001-01-01     1
# 3     1 2001-01-01     1
# 4     1 2001-01-03     2
# 5     1 2001-01-03     2
# 6     2 2011-01-01     5
# 7     2 2011-01-01     5
# 8     2 2010-03-12     4
# 9     2 2010-03-12     4
#10     2 2001-01-01     3

db %>% 
   mutate(label = group_indices(., ID, date))
#   ID       date label
#1   1 2001-01-01     1
#2   1 2001-01-01     1
#3   1 2001-01-01     1
#4   1 2001-01-03     2
#5   1 2001-01-03     2
#6   2 2011-01-01     5
#7   2 2011-01-01     5
#8   2 2010-03-12     4
#9   2 2010-03-12     4
#10  2 2001-01-01     3
1

1 Answers

1
votes

We can use dense_rank.

library(dplyr)

db2 <- db %>%
  group_by(ID) %>%
  mutate(rank = dense_rank(date)) %>%
  ungroup()
db2
# # A tibble: 10 x 3
#      ID date        rank
#   <dbl> <date>     <int>
#  1    1. 2001-01-01     1
#  2    1. 2001-01-01     1
#  3    1. 2001-01-01     1
#  4    1. 2001-01-03     2
#  5    1. 2001-01-03     2
#  6    2. 2011-01-01     3
#  7    2. 2011-01-01     3
#  8    2. 2010-03-12     2
#  9    2. 2010-03-12     2
# 10    2. 2001-01-01     1