1
votes

I have the following data.table:

DT <- data.table(name = c('ana', 'ana', 'ana', 'ana', 'justin', 'justin', 'justin', 'justin'), age = c(12, 26, 24, 14, 28, 36, 17, 8))

I want to be able to group by name, order the ages, and remove all groups in which the second highest age is below 28. How can I do that with data.table?

The expected output should be:

data.table(name = c('justin', 'justin', 'justin', 'justin'), age = c(28, 36, 17, 8))

Since it will remove the 'ana' group, because the second highest age in that group is less than 28.

2
Can we see the code you tried and did not work? Also the expected output should be included as well - Sotos
Sorry, just put the output desired. I was trying something like: DT[order(age)[2] >= 28, , by = name], but it clearly doesn't work :( - daniellga

2 Answers

3
votes

A fast and concise option (that also reorders the data)

setkey(DT, name, age)
DT[, .SD[age[.N-1] >= 28], by = name]
     name age
1: justin   8
2: justin  17
3: justin  28
4: justin  36
2
votes

An option is

DT[!name %chin% DT[order(-age), if (.N > 1L && age[2L] < 28) name, name]$V1]

Or an anti-join:

DT[!DT[order(-age), if (.N > 1L && age[2L] < 28) name, name], on=.(name)]