dplyr - How to obtain the order of one column within a group?

Question

Example data:

tibbly = tibble(age = c(10,30,50,10,30,50,10,30,50,10,30,50),
              grouping1 = c("A","A","A","A","A","A","B","B","B","B","B","B"),
              grouping2 = c("X", "X", "X","Y","Y","Y","X","X","X","Y","Y","Y"),
              value = c(1,2,3,4,4,6,2,5,3,6,3,2))
> tibbly
# A tibble: 12 x 4
     age grouping1 grouping2 value
   <dbl> <chr>     <chr>     <dbl>
 1    10 A         X             1
 2    30 A         X             2
 3    50 A         X             3
 4    10 A         Y             4
 5    30 A         Y             4
 6    50 A         Y             6
 7    10 B         X             2
 8    30 B         X             5
 9    50 B         X             3
10    10 B         Y             6
11    30 B         Y             3
12    50 B         Y             2

Question: How to obtain the order of rows for each group in a dataframe? I can use dplyr to arrange the data in the an appropriate form to visualize what I am interested in:

> tibbly %>% 
     group_by(grouping1, grouping2) %>%
     arrange(grouping1, grouping2, desc(value))
# A tibble: 12 x 4
# Groups:   grouping1, grouping2 [4]
     age grouping1 grouping2 value
   <dbl> <chr>     <chr>     <dbl>
 1    50 A         X             3
 2    30 A         X             2
 3    10 A         X             1
 4    50 A         Y             6
 5    10 A         Y             4
 6    30 A         Y             4
 7    30 B         X             5
 8    50 B         X             3
 9    10 B         X             2
10    10 B         Y             6
11    30 B         Y             3
12    50 B         Y             2

In the end I am interested in the order of the age column, for each group based on the value column. Is there a elegant way to do this with dplyr? Something like summarise() based on the order of rows and not actual values

Are you looking for row_number? As in tibbly %>% group_by(grouping1, grouping2) %>% arrange(grouping1, grouping2, desc(value)) %>% mutate(RowNum=row_number()) — Kerry Jackson
....or something like this? tibbly %>% group_by(grouping1, grouping2) %>% arrange(grouping1, grouping2, desc(value)) %>% summarise(order = paste0(age, collapse = ",")) %>% ungroup() — AntoniosK
@AntoniosK That is pretty much what I was looking for! The order of the age column within each group. I wasn't aware of the usage of paste with collapse in that context. Very neat. Thank you! — Mojoesque
@AntoniosK please post your comment as an answer so that the question does no longer appear as unanswered. — erc
@Mojoesque Note that arrange(grouping1, grouping2, desc(value)) can be used for visualisation purposes, but arrange(desc(value)) after grouping is enough to do the job you want. — AntoniosK

AntoniosK AntoniosK · Accepted Answer · 2018-08-17T11:47:07

library(dplyr)

tibbly = tibble(age = c(10,30,50,10,30,50,10,30,50,10,30,50),
                grouping1 = c("A","A","A","A","A","A","B","B","B","B","B","B"),
                grouping2 = c("X", "X", "X","Y","Y","Y","X","X","X","Y","Y","Y"),
                value = c(1,2,3,4,4,6,2,5,3,6,3,2))


tibbly %>% 
  group_by(grouping1, grouping2) %>%                  # for each group
  arrange(desc(value)) %>%                            # arrange value descending
  summarise(order = paste0(age, collapse = ",")) %>%  # get the order of age as a strings
  ungroup()                                           # forget the grouping

# # A tibble: 4 x 3
#   grouping1 grouping2 order   
#   <chr>     <chr>     <chr>   
# 1 A         X         50,30,10
# 2 A         Y         50,10,30
# 3 B         X         30,50,10
# 4 B         Y         10,30,50

dplyr - How to obtain the order of one column within a group?

2 Answers