0
votes

Here is my data.table

structure(list(index = structure(c(17007, 17008, 17009, 17010, 
17011, 17014, 17015, 17016, 17017), tzone = "UTC", tclass = "Date", class = "Date"), 
    close = c(97.339996, 96.669998, 102.949997, 104.339996, 104.209999, 
    106.050003, 104.480003, 105.790001, 105.870003), daily_return = c(-0.688306993560994, 
    6.49632681279251, 1.35016905342893, -0.124589807344833, 1.76566933850562, 
    -1.48043371578217, 1.25382653367649, 0.0756234041438208, 
    1.5207329313101), group = c(0, 0, 3, 3, 3, 3, 3, 3, 3), qq = c(1, 
    0, 6, 5, 4, 3, 2, 1, 0)), class = c("data.table", "data.frame"
), row.names = c(NA, -9L), .internal.selfref = <pointer: 0x558d2914e9a0>, sorted = "index")

and it looks like as follows -

        index  close daily_return group qq
1: 2016-07-25  97.34   -0.6883070     0  1
2: 2016-07-26  96.67    6.4963268     0  0
3: 2016-07-27 102.95    1.3501691     3  6
4: 2016-07-28 104.34   -0.1245898     3  5
5: 2016-07-29 104.21    1.7656693     3  4
6: 2016-08-01 106.05   -1.4804337     3  3
7: 2016-08-02 104.48    1.2538265     3  2
8: 2016-08-03 105.79    0.0756234     3  1
9: 2016-08-04 105.87    1.5207329     3  0

In this data.table, I need to find the return (from column close) by the interval given in qq column for each group. E.g. in row 3, the look forwarding interval is 6 and the return is to be calculated between column 3 and column 9 (3 + 6 lookforward).

When the group changes, the return is the be calculated between the last element of the first group and the first element of the second group. e.g, as the group changes between row 2 and row3, the return is calculated between these two rows.

Following is the desired result -

        index  close daily_return group qq desired_result
1: 2016-07-25  97.34   -0.6883070     0  1          -0.68
2: 2016-07-26  96.67    6.4963268     0  0           6.49
3: 2016-07-27 102.95    1.3501691     3  6           2.83
4: 2016-07-28 104.34   -0.1245898     3  5           1.46
5: 2016-07-29 104.21    1.7656693     3  4           1.59
6: 2016-08-01 106.05   -1.4804337     3  3          -0.16
7: 2016-08-02 104.48    1.2538265     3  2           1.33
8: 2016-08-03 105.79    0.0756234     3  1           0.07
9: 2016-08-04 105.87    1.5207329     3  0             NA

I am sure there must be an easy way to accomplish this using data.table.

I have tried the following and it works only on specific rows as k is constant.

lead(Delt(close, k = 6, type = "arithmetic") * 100, n = 6)

Is there a way to pass the column qq dynamically in the k variable of Delt command? OR, is there any other solution to this problem?

2
Mini suggestion, more theoretic, perhaps take log return, since stock distribution is more lognormal (black-scholes) than normal, ergo with log you make your returns normal.JacobJacox
Understood, since I am using arithmetic for the testing purpose only as it is easy to test. Thanks!Saurabh
Why is the desired result 2.76 for row 3 instead of (105.87-102.95)/102.95=2.84?chinsoon12
I have fixed the desired result column in the question above. Thanks for checking.Saurabh

2 Answers

3
votes

Here is an approach which uses data.table syntax without looping:

idx2 <- close[, .I + ifelse(group == shift(group, -1L), qq, 1L)]
close[, return := (close[idx2] - close) / close * 100][]
        index  close daily_return group qq     return
1: 2016-07-25  97.34   -0.6883070     0  1 -0.6883070
2: 2016-07-26  96.67    6.4963268     0  0  6.4963268
3: 2016-07-27 102.95    1.3501691     3  6  2.8363342
4: 2016-07-28 104.34   -0.1245898     3  5  1.4663667
5: 2016-07-29 104.21    1.7656693     3  4  1.5929412
6: 2016-08-01 106.05   -1.4804337     3  3 -0.1697313
7: 2016-08-02 104.48    1.2538265     3  2  1.3303981
8: 2016-08-03 105.79    0.0756234     3  1  0.0756234
9: 2016-08-04 105.87    1.5207329     3  0         NA

idx2 contains the row indices which are computed from the local index .I plus qq except when a new group starts in the next row.

idx2
[1]  2  3  9  9  9  9  9  9 NA

The formula to compute return is taken from help("Delt", "quantmod"):

Arithmetic differences are used by default: Lag = (x2(t) - x1(t-k))/x1(t-k)

Therefore, it is unclear to me why OP's expected result is different. See also chinsoon's comment.

2
votes

This does what you asked, but the desired result you are showing doesn't quite match the results one gets by following the algorithm you described, as far as I can tell.

Anyway, the algorithm is simple to accomplish in a loop:

# Creates a zero-filled column to hold the results
df$desired_result <- numeric(length(df$close))

# Loops through each row
for(i in seq_along(df$desired_result))
{
  # We don’t want to do anything in the final row
  if(i == length(df$desired_result)) break;

  # if the row’s qq value is 0, get the difference between this row and next row’s close value
  # otherwise use the row’s qq value to look ahead that number of rows
  if(df$qq[i] == 0) df$desired_result[i] <- df$close[i + 1] - df$close[i]
  else df$desired_result[i] <-  df$close[i + df$qq[i]] - df$close[i]
} 

Now you can do:

#> df
#>       index  close daily_return group qq desired_result
#> 1 2016-07-25  97.34   -0.6883070     0  1      -0.670000
#> 2 2016-07-26  96.67    6.4963268     0  0       6.279997
#> 3 2016-07-27 102.95    1.3501691     3  6       2.920006
#> 4 2016-07-28 104.34   -0.1245898     3  5       1.530007
#> 5 2016-07-29 104.21    1.7656693     3  4       1.660004
#> 6 2016-08-01 106.05   -1.4804337     3  3      -0.180000
#> 7 2016-08-02 104.48    1.2538265     3  2       1.390000
#> 8 2016-08-03 105.79    0.0756234     3  1       0.080002
#> 9 2016-08-04 105.87    1.5207329     3  0       0.000000