R: How to aggregate (and sum up) rows in df according to multiple column criteria and keep previous order?

Question

I want to aggregate rows (and sum up values) in the following data example:

    df <- data.frame(from=c("A" ,"A", "A", "C", "C", "D", "A"),
                 to=c("B", "B", "B", "A", "A", "B", "D"),
                 values=c(5,6,2,10,2,6,3),
                 product=c("x","x", "x", "y", "z", "w", "w"),
                 year=c(1990,1991,1991,1990,1990,1991,1992))

> df
  from to values product year
1    A  B      5       x 1990
2    A  B      6       x 1991
3    A  B      2       x 1991
4    C  A     10       y 1990
5    C  A      2       z 1990
6    D  B      6       w 1991
7    A  D      3       w 1992

All rows containing same values/characters for the columns from, to, product and year should be aggregated to one row and the values in the values column should be summed up.

I tried the following code:

aggregate(values~from+to+product+year, df, FUN=sum)

and

ddply(df_id, c("from", "to", "product", "year"), numcolwise(sum))

these codes worked well. However, both changed the order of rows (also columns which is less important) see below:

for aggregate:
      from to product year values
    1    A  B       x 1990      5
    2    C  A       y 1990     10
    3    C  A       z 1990      2
    4    D  B       w 1991      6
    5    A  B       x 1991      8
    6    A  D       w 1992      3

and for ddply:

       from    to product year values
    1     C     A       y 1990     10
    2     C     A       z 1990      2
    3     A     B       x 1990      5
    4     A     B       x 1991      8
    5     A     D       w 1992      3
    6     D     B       w 1991      6

the expected result should look like:

  from to values product year
1    A  B      5       x 1990
2    A  B      8       x 1991
3    C  A     10       y 1990
4    C  A      2       z 1990
5    D  B      6       w 1991
6    A  D      3       w 1992

Any ideas how to solve this order problem (at least for rows)? Thanks

Frank Frank · Accepted Answer · 2015-10-09T21:28:19

The data.table package retains original by-group ordering by default:

library(data.table)

setDT(df)[, .(v = sum(values)), by=.(from,to,product,year)] 

#    from to product year  v
# 1:    A  B       x 1990  5
# 2:    A  B       x 1991  8
# 3:    C  A       y 1990 10
# 4:    C  A       z 1990  2
# 5:    D  B       w 1991  6
# 6:    A  D       w 1992  3

Only with keyby= instead of by= would it sort groups.

R: How to aggregate (and sum up) rows in df according to multiple column criteria and keep previous order?

2 Answers