data.table sum by group and return row with max value

Question

I have a data.table in this fashion:

dd <- data.table(f = c("a", "a", "a", "b", "b"), g = c(1,2,3,4,5))
dd

I need to sum the values g by factor f, and finally return a single row data.table object that has the maximum value of g, but that also contains the factor information. i.e.

___f|g   
1: b 9

My closest attempt so far is

tmp3 <- dd[, sum(g), by = f][, max(V1)]
tmp3

Which results in:

> tmp3
[1] 9

EDIT: I'm ideally looking for a purely data.table piece of code/workflow. I'm surprised that with all the speedy fast split-apply-combine wizardry and ability to subset your data in the form of 'example[i= subset, ]` that I haven't found a straight forward way to subset on a single value condition.

nrussell nrussell · Accepted Answer · 2015-03-23T13:45:14

Here's one way to do it:

library(data.table)
dd <- data.table(
  f = c("a", "a", "a", "b", "b"), 
  g = c(1,2,3,4,5))
##
> dd[,list(g = sum(g)),by=f][which.max(g),]
   f g
1: b 9

data.table sum by group and return row with max value

2 Answers