Filter and group by with data.table

Question

I have the following data.table:

DT <- data.table(name = c('ana', 'ana', 'ana', 'ana', 'justin', 'justin', 'justin', 'justin'), age = c(12, 26, 24, 14, 28, 36, 17, 8))

I want to be able to group by name, order the ages, and remove all groups in which the second highest age is below 28. How can I do that with data.table?

The expected output should be:

data.table(name = c('justin', 'justin', 'justin', 'justin'), age = c(28, 36, 17, 8))

Since it will remove the 'ana' group, because the second highest age in that group is less than 28.

Can we see the code you tried and did not work? Also the expected output should be included as well — Sotos
Sorry, just put the output desired. I was trying something like: DT[order(age)[2] >= 28, , by = name], but it clearly doesn't work :( — daniellga

sindri_baldur sindri_baldur · Accepted Answer · 2020-02-11T09:44:41

A fast and concise option (that also reorders the data)

setkey(DT, name, age)
DT[, .SD[age[.N-1] >= 28], by = name]
     name age
1: justin   8
2: justin  17
3: justin  28
4: justin  36

Filter and group by with data.table

2 Answers