2
votes

Apologies for the convoluted title. Essentially, I am aggregating some data in R.

df <- data.frame(value1=c(1,2,3,4,5,6,7,8,9,10),
+                  value2=c(2,3,4,5,6,7,8,9,1,2), 
+                  group=c("a","b","a","b","a","b","a","b","a","c")) 
DT <- as.data.table(df)

The resulting data looks like this:

 value1 value2  group
    1   2   a
    2   3   b
    3   4   a
    4   5   b
    5   6   a
    6   7   b
    7   8   a
    8   9   b
    9   1   a
    10  2   c

I would like to find the value1 for each group of the row with the highest value2, using data.table.

For example, the above table would turn into:

new_val group
7      a
8      b
10     c

...because 7 is the value of value1 for the row with the max value2 in group a, and so on and so forth.

I'm having a very difficult time doing this while creating a new datatable with a number of grouping operations.

The code I am using currently looks like:

DT[,list(
    rate_max_acct = max(value1(which.max(value2)))
    ),
  by=group] -> DT2

This isn't working -- what I was trying to do is select the row where value2 is maximum for each froup, and then find the max of value1 for that group, but it doesn't work and throws the following error:

  could not find function "value1"

So evidently it thinks that I'm trying to apply a function, not look for an object.

Any help here would be greatly appreciated, especially if I can do it within this list -- I'm doing a bunch of grouping operations at once and would really like it if I could keep the "xxx = yyy" syntax.

2
You used ( on variable instead of [ in your value1(.jangorecki

2 Answers

2
votes

You can do:

DT[,list(newval=value1[which.max(value2)]), group]
#   group newval
#1:     a      7
#2:     b      8
#3:     c     10

You do not need so much max(value1[which.max(value2)]) since which.max will give you the index of the row where value2 is maximum, you just need value1[which.max(value2)]

0
votes

Simple extension to previous answer in case you are looking for multiple maxima.

library(data.table) ; setkey(DT, group)
tmp <- DT[, .(value1[which(value2 == max(value2)), max(value2))], by = group]

This will give you a three column data.table with the first column the group identifier, the second column all the variables value1 for which value2 reaches the maximum, and the third column, the maximum value2 per group.