I want to aggregate rows (and sum up values) in the following data example:
df <- data.frame(from=c("A" ,"A", "A", "C", "C", "D", "A"),
to=c("B", "B", "B", "A", "A", "B", "D"),
values=c(5,6,2,10,2,6,3),
product=c("x","x", "x", "y", "z", "w", "w"),
year=c(1990,1991,1991,1990,1990,1991,1992))
> df
from to values product year
1 A B 5 x 1990
2 A B 6 x 1991
3 A B 2 x 1991
4 C A 10 y 1990
5 C A 2 z 1990
6 D B 6 w 1991
7 A D 3 w 1992
All rows containing same values/characters for the columns from
, to
, product
and year
should be aggregated to one row and the values in the values
column should be summed up.
I tried the following code:
aggregate(values~from+to+product+year, df, FUN=sum)
and
ddply(df_id, c("from", "to", "product", "year"), numcolwise(sum))
these codes worked well. However, both changed the order of rows (also columns which is less important) see below:
for aggregate:
from to product year values
1 A B x 1990 5
2 C A y 1990 10
3 C A z 1990 2
4 D B w 1991 6
5 A B x 1991 8
6 A D w 1992 3
and for ddply:
from to product year values
1 C A y 1990 10
2 C A z 1990 2
3 A B x 1990 5
4 A B x 1991 8
5 A D w 1992 3
6 D B w 1991 6
the expected result should look like:
from to values product year
1 A B 5 x 1990
2 A B 8 x 1991
3 C A 10 y 1990
4 C A 2 z 1990
5 D B 6 w 1991
6 A D 3 w 1992
Any ideas how to solve this order problem (at least for rows)? Thanks