Find the value of a variable for the row within a group that is the maximum of another variable when grouping

Question

Apologies for the convoluted title. Essentially, I am aggregating some data in R.

df <- data.frame(value1=c(1,2,3,4,5,6,7,8,9,10),
+                  value2=c(2,3,4,5,6,7,8,9,1,2), 
+                  group=c("a","b","a","b","a","b","a","b","a","c")) 
DT <- as.data.table(df)

The resulting data looks like this:

 value1 value2  group
    1   2   a
    2   3   b
    3   4   a
    4   5   b
    5   6   a
    6   7   b
    7   8   a
    8   9   b
    9   1   a
    10  2   c

I would like to find the value1 for each group of the row with the highest value2, using data.table.

For example, the above table would turn into:

new_val group
7      a
8      b
10     c

...because 7 is the value of value1 for the row with the max value2 in group a, and so on and so forth.

I'm having a very difficult time doing this while creating a new datatable with a number of grouping operations.

The code I am using currently looks like:

DT[,list(
    rate_max_acct = max(value1(which.max(value2)))
    ),
  by=group] -> DT2

This isn't working -- what I was trying to do is select the row where value2 is maximum for each froup, and then find the max of value1 for that group, but it doesn't work and throws the following error:

  could not find function "value1"

So evidently it thinks that I'm trying to apply a function, not look for an object.

Any help here would be greatly appreciated, especially if I can do it within this list -- I'm doing a bunch of grouping operations at once and would really like it if I could keep the "xxx = yyy" syntax.

Colonel Beauvel Colonel Beauvel · Accepted Answer · 2015-04-17T21:53:11

You can do:

DT[,list(newval=value1[which.max(value2)]), group]
#   group newval
#1:     a      7
#2:     b      8
#3:     c     10

You do not need so much max(value1[which.max(value2)]) since which.max will give you the index of the row where value2 is maximum, you just need value1[which.max(value2)]

Find the value of a variable for the row within a group that is the maximum of another variable when grouping

2 Answers