3
votes

I have a data.table in this fashion:

dd <- data.table(f = c("a", "a", "a", "b", "b"), g = c(1,2,3,4,5))
dd

I need to sum the values g by factor f, and finally return a single row data.table object that has the maximum value of g, but that also contains the factor information. i.e.

___f|g   
1: b 9

My closest attempt so far is

tmp3 <- dd[, sum(g), by = f][, max(V1)]
tmp3

Which results in:

> tmp3
[1] 9

EDIT: I'm ideally looking for a purely data.table piece of code/workflow. I'm surprised that with all the speedy fast split-apply-combine wizardry and ability to subset your data in the form of 'example[i= subset, ]` that I haven't found a straight forward way to subset on a single value condition.

2

2 Answers

6
votes

Here's one way to do it:

library(data.table)
dd <- data.table(
  f = c("a", "a", "a", "b", "b"), 
  g = c(1,2,3,4,5))
##
> dd[,list(g = sum(g)),by=f][which.max(g),]
   f g
1: b 9

5
votes

You can use dplyr syntax on a data.table, in this case:

library(dplyr)
dd %>%
  group_by(f) %>%
  summarise (g = sum(g)) %>%
  top_n(1, g)

Source: local data table [1 x 2]

  f g
1 b 9