0
votes

I have tried to look at manuals for aggregate, apply, etc, but I can't find examples where some columns are skipped by the function being applied, while other columns are copied as-is. Example:

> olddf = data.frame(code=c("one","one","two"), val1=c(1,2,3), val2=c(4,5,6), val3=c(7,8,9))
> olddf
  code val1 val2 val3
1  one    1    4    7
2  one    2    5    8
3  two    3    6    9
> 

How do I aggregate olddf so that I get a new data frame where:

  • code column is kept as-is,
  • val1 is skipped
  • val2 is aggregated normally e.g. sum()
  • and a new column is created based on one of the earlier aggregates... e.g. new column = sum(val3)/sum(val2)?

I basically want:

> newdf
  code val2 newcol
1  one    9 1.6666
2  two    6 1.5000

I would like to do this in one step rather than defining a separate function to deal with each column / aggregate. Is this possible?

2

2 Answers

2
votes

Try data.table

library(data.table)
setDT(olddf)[, .(val2 = sum(val2),
                 newcol = sum(val3)/sum(val2)), by = code]
#    code val2   newcol
# 1:  one    9 1.666667
# 2:  two    6 1.500000
0
votes

You can do this in two steps, the key is using the aggregate function. In the by() argument, you want to specify how you want to aggregate. In this case, aggregate according to the value in "Code". Then, add third column just by specifying the function of interest.

aggregate <- aggregate(olddf[, c("val2", "val3")], by = list(olddf$code), FUN = sum)
aggregate$newcol <- aggregate$val3/aggregate$val2