1
votes

I have following dataframe in R

     Key       Quantity
     1_2013    20
     1_2013    20
     2_2013    20
     2_2013    30
     3_2013    20
     3_2013    20
     4_2013    20
     4_2013    30 
     10_2013   20
     10_2013   20
     11_2013   20
     11_2013   30

When I aggregate on Key column I want to keep the original order of Key column. But,when I do group_by in dplyr it gives me following order

     Key       Quantity
     1_2013    40
     10_2013   40
     11_2013   50
     2_2013    50
     3_2013    40
     4_2013    50

I want it in following order

     Key       Quantity
     1_2013    40
     2_2013    50
     3_2013    40
     4_2013    50
     10_2013   40
     11_2013   50

How can I do it in dplyr?

3
Use Order by key in the query,VinuBibin
There are already nice solutions posted, but maybe using year_month instead of month_year would workaround the problem altogether. Y_m format, like Y_m_d, makes things easier also when naming/browsing files as they usually appear alphabetically sorted as well.zeehio
Could you clarify what the actual intended output is, as it will affect what the "correct" approach might be to your problem. Do you want the output to be ordered in the same manner as your input (regardless of temporal ordering), or do you want the output to be ordered by month and year?Benjamin

3 Answers

3
votes

The OP has requested When I aggregate on Key column I want to keep the original order of Key column.

forcats::fct_inorder()

The forcats package which is part of the tidyverse has the fct_inorder() which creates a factor where the factor levels are numbered in order of appearance:

library(tidyverse)
read_table(
"    Key       Quantity
     1_2013    20
     1_2013    20
     2_2013    20
     2_2013    30
     3_2013    20
     3_2013    20
     4_2013    20
     4_2013    30 
     10_2013   20
     10_2013   20
     11_2013   20
     11_2013   30"
) %>% 
  group_by(Key = fct_inorder(Key)) %>% 
  summarise(Quantity = sum(Quantity))
# A tibble: 6 x 2
  Key     Quantity
  <fct>      <int>
1 1_2013        40
2 2_2013        50
3 3_2013        40
4 4_2013        50
5 10_2013       40
6 11_2013       50

data.table

For the sake of completeness:
Although the OP has clearly asked for a dplyr solution I just want to mention that grouping with by = in data.table returns the groups in order of appearance by default. So, no factors are needed.

library(data.table)
fread(
  "    Key       Quantity
     1_2013    20
     1_2013    20
     2_2013    20
     2_2013    30
     3_2013    20
     3_2013    20
     4_2013    20
     4_2013    30 
     10_2013   20
     10_2013   20
     11_2013   20
     11_2013   30"
)[, .(Quantity = sum(Quantity)), by = Key]
       Key Quantity
1:  1_2013       40
2:  2_2013       50
3:  3_2013       40
4:  4_2013       50
5: 10_2013       40
6: 11_2013       50
2
votes

One possible way is to transform your grouping variable to a factor variable, using the levels in the order you prefer (i.e. appear in the original dataset):

df = read.table(text = "
Key       Quantity
1_2013    20
1_2013    20
2_2013    20
2_2013    30
3_2013    20
3_2013    20
4_2013    20
4_2013    30 
10_2013   20
10_2013   20
11_2013   20
11_2013   30
", header=T, stringsAsFactors=F)

library(dplyr)

df %>%
  group_by(Key = factor(Key, levels = unique(Key))) %>%
  summarise(Quantity = sum(Quantity))

# # A tibble: 6 x 2
#    Key     Quantity
#    <fct>      <int>
# 1 1_2013        40
# 2 2_2013        50
# 3 3_2013        40
# 4 4_2013        50
# 5 10_2013       40
# 6 11_2013       50

You can transform back to character variable, if you want, by adding ... %>% mutate(Key = as.character(Key)) in the end.

1
votes

You've fallen pray to character sorting. You've got a few options that I'll present in order of my own personal (and highly subjective) preference.

df <- "Key,Quantity
1_2013,20
1_2013,20
2_2013,20
2_2013,30
3_2013,20
3_2013,20
4_2013,20
4_2013,30 
10_2013,20
10_2013,20
11_2013,20
11_2013,30"

df <- read.csv(text = df, stringsAsFactors = FALSE)

standardize the character length

The advantage of keeping the character length consistent throughout the vector is t hat it makes a lot of sorting issues go away.

df %>% 
  mutate(Key = sub("^(\\d{1})_", "0\\1_", Key)) %>% 
  group_by(Key) %>% 
  summarise(Quantity = sum(Quantity))

Make Key a factor

In principle Key is a factor variable. If there are no more manipulations to be done to it, there's no reason to retain it as a character value.

library(dplyr)
library(gtools)
df %>% 
  mutate(Key = factor(Key,
                      labels = mixedsort(unique(Key)))) %>% 
  group_by(Key) %>% 
  summarise(Quantity = sum(Quantity))

Bring in the old fashioned way of sorting

Use mixedsort and sort by index.

library(dplyr)
library(gtools)
df <- 
  df %>% 
  group_by(Key) %>% 
  summarise(Quantity = sum(Quantity))
df <- df[mixedorder(df$Key), ]