Count of values within specified range of value in each row using data.table

Question

To come up with a column of counts for each level (or combination of levels) for categorical variables is data.table syntax can be handled with something like:

#setting up the data so it's pasteable
df <- data.table(var1 = c('dog','cat','dog','cat','dog','dog','dog'),
                 var2 = c(1,5,90,95,91,110,8),
                 var3 = c('lamp','lamp','lamp','table','table','table','table'))

#adding a count column for var1
df[, var1count := .N, by = .(var1)]

#adding a count of each combo of var1 and var3
df[, var1and3comb := .N, by = .(var1,var3)]

I am curious as to how I could instead produce a count column that counts the number of records with a value that is within +- 5 from each value of var2.

In my non-functioning attempt at this,

df[, var2withinrange := .N, by = .(between((var2-5),(var2+5),var2))]

I get a column with the total number of records as opposed to the desired result. I'd be hoping for the first row to hold a value of 2, since the 1 and 5 fall into that range. Row 2 should have a value of 3, since the 1, 5, and 8 all fall into that range for the 5, and so on.

Any help on coming up with a solution is much appreciated. Ideally in data.table code!

Jaap Jaap · Accepted Answer · 2018-10-23T12:54:43

A solution with data.table:

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5)
                           , on = .(var2 >= var2min, var2 <= var2plus)
                           , .N
                           , by = .EACHI][, N]][]

which gives:

> df
   var1 var2  var3 var2withinrange
1:  dog    1  lamp               2
2:  cat    5  lamp               3
3:  dog   90  lamp               3
4:  cat   95 table               3
5:  dog   91 table               3
6:  dog  110 table               1
7:  dog    8 table               2

Count of values within specified range of value in each row using data.table

2 Answers