1
votes

I have a data table that looks like below. I would like to calculate the correlation of return against each signal, for every market.

dt = data.table(mkt = rep(letters[1:3], each = 3), rtn = rnorm(9), signal1=rnorm(9), signal2=rnorm(9), signal3 = rnorm(9))
   mkt      rtn    signal1     signal2    signal3
1:   a  0.2488643  0.4110516 -0.04861252 -1.3599824
2:   a  1.3387256 -0.4418436 -0.17055841 -1.2161698
3:   a -1.4058236 -1.2624645 -0.24315048 -1.2722546
4:   b  1.7056606  0.2618591  2.60779232  0.7786226
5:   b  0.7913587 -1.0596116  0.31152541  1.7336651
6:   b -1.8690651  0.1942825  0.95430075 -0.7030462
7:   c -0.4937575 -1.8645226 -0.32312077 -1.7138482
8:   c -0.7153342 -0.5142624 -0.43817789 -1.3637261
9:   c  0.3766730 -0.0954339  0.71159756 -1.2118075

dt[, lapply(.SD, function(x) cor(x, rtn, use = 'c')), .SDcols = 3:5, by = mkt]
Error in is.data.frame(y) : object 'rtn' not found

How can I make the anonymous function in J aware of the rtn column?

1

1 Answers

2
votes

I think one way would be to include it in the .SDcols so that the anonymous function will be able to find rtn and then probably remove the rtn column (because it will only have 1 as a value since it will be the correlation with itself):

dt[, lapply(.SD, function(x) cor(x, rtn, use = 'c')), .SDcols = c(2, 3:5), by = mkt]

   mkt rtn    signal1    signal2    signal3
1:   a   1  0.6759421 -0.5037837  0.8605805
2:   b   1 -0.8494135  0.6720274  0.7832928
3:   c   1 -0.9425291  0.5683629 -0.9976231

And then you could do:

dt2 <- dt[, lapply(.SD, function(x) cor(x, rtn, use = 'c')), .SDcols = c(2, 3:5), by = mkt]
dt2[, rtn := NULL]
dt2
#   mkt    signal1    signal2    signal3
#1:   a  0.6759421 -0.5037837  0.8605805
#2:   b -0.8494135  0.6720274  0.7832928
#3:   c -0.9425291  0.5683629 -0.9976231