1
votes

Below is some data which I currently calculate in Excel.

col_A   col _B  col_C col_D col_E   col_F    col_G
-1.5%   0.010   1.00    1   1.00     -       -   
-5.4%   0.024   1.00    1   1.00     0.01   -0.00 
-7.9%   0.036   1.00    1   1.00     0.02   -0.00 
-12.7%  0.052   0.99    1   0.99     0.06   -0.01 
-4.6%   0.049   0.98    1   0.98     0.19   -0.01 
-8.3%   0.051   0.95    1   0.95     0.39   -0.03 
-7.3%   0.052   0.88    1   0.88     1.00   -0.07 
-9.2%   0.055   0.69    1   0.69     2.31   -0.21 
-7.9%   0.055   0.38    1   0.38     5.63   -0.44 
-2.2%   0.051   0.29    1   0.29     11.13  -0.24 

I have been trying to perform the calculations in R using data.table. The problem I have is that data.table performs calculation column-wise. I need the calculations to be performed row-wise, because of dependencies on the results of previous row values. The Excel-formulas for the calculated columns are given below, with "T" indicating "current row" and "T-1" indication "previous row"

col_C: (col_C.T-1) * (1 + col_G.T)

col_D: max (Col_C.T, col_D.T-1)

col_E: (col_C.T / col_D.T)

col_F: max ((1 - (col_C.T-1 / col_D.T-1)) / col B.T-1), 0.01)

col_G: col_A * col_F

Any assistance is greatly appreciated.

2
I assume you're looking for data.table::shift to lag/lead entries?Maurits Evers
I have found some success using data.table::shift. However, the calculations of each column is dependant on the results of the previous row, which is not properly captured.Sarel Louw
(1) shift allows you to calculate values based on previous row entries, so I would imagine shift is the way to go here; (2) however I do struggle to make sense of your "rules", which look cyclic to me. For example entries in column C are based on column G, which in turn depends on column F, which depends on column C again.Maurits Evers
Yes it is the dependencies that are giving a headache. Note that the it jumps from T to T-1, for example when F goes back to C.Sarel Louw
Yes I understand that, but your problem statement is not very clear (to me); we still need initial values for certain columns. For instance, are the first row values for columns C, D and E always 1.0?Maurits Evers

2 Answers

0
votes

Ok, so this is not an answer but too long for a comment.

Please double-check your rules! They are not consistent with the input and expected output.

To demonstrate, let's take rows 2 and 3

col_A   col _B  col_C col_D col_E   col_F    col_G
...
-5.4%   0.024   1.00    1   1.00     0.01   -0.00 
-7.9%   0.036   1.00    1   1.00     0.02   -0.00 

and calculate col_F value for row i = 3 according to your rules:

col_F[i] = max((1 - col_C[i - 1] / col_D[i - 1]) / col_B[i - 1], 0.01)
         = max((1 - 1 / 1) / 0.024, 0.01)
         = max(0, 0.01)
         = 0.01

So the value in row 3 for col_F should be 0.01 instead of 0.02.

There are possibilities:

  1. Your rules are incorrect, or
  2. your starting values of entries in col_C, col_D and col_E for row 1 are incorrect.

Either way, at the moment input data, expected output and rules do not agree.

0
votes

If there are no other conditions which require to use data.table I suggest to implement the rowwise calculations using a matrix:

m <- data.matrix(dt)
m[, 3:7] <- NA

for (i in seq.int(nrow(m))) {
  if (i == 1L) {
    m[i, "col_F"] <- 0
    m[i, "col_G"] <- 0 
    m[i, "col_C"] <- 1
    m[i, "col_D"] <- 1
  } else {
    m[i, "col_F"] <- max((1 - (m[i-1, "col_C"] / m[i-1, "col_D"])) / m[i-1, "col_B"], 0.01)
    m[i, "col_G"] <- m[i, "col_A"] * m[i, "col_F"]
    m[i, "col_C"] <- m[i-1, "col_C"] * (1 + m[i, "col_G"])
    m[i, "col_D"] <- max(m[i, "col_C"], m[i-1, "col_D"])
  }
m[i, "col_E"] <- m[i, "col_C"] / m[i, "col_D"]  
}

m
       col_A col_B     col_C col_D     col_E       col_F        col_G
 [1,] -0.015 0.010 1.0000000     1 1.0000000  0.00000000  0.000000000
 [2,] -0.054 0.024 0.9994600     1 0.9994600  0.01000000 -0.000540000
 [3,] -0.079 0.036 0.9976835     1 0.9976835  0.02250000 -0.001777500
 [4,] -0.127 0.052 0.9895302     1 0.9895302  0.06434834 -0.008172239
 [5,] -0.046 0.049 0.9803653     1 0.9803653  0.20134322 -0.009261788
 [6,] -0.083 0.051 0.9477596     1 0.9477596  0.40070748 -0.033258721
 [7,] -0.073 0.052 0.8768905     1 0.8768905  1.02432085 -0.074775422
 [8,] -0.092 0.055 0.6858958     1 0.6858958  2.36749020 -0.217809099
 [9,] -0.079 0.055 0.3764416     1 0.3764416  5.71098585 -0.451167882
[10,] -0.022 0.051 0.2825483     1 0.2825483 11.33742486 -0.249423347

The deviations in the last 4 rows of col_F from OP's expected result might be due to the limited precision of the posted values of col_A and col_B.

Data

library(data.table)

dt <- fread("col_A   col_B  col_C col_D col_E   col_F    col_G
-1.5%   0.010   1.00    1   1.00     -       -   
-5.4%   0.024   1.00    1   1.00     0.01   -0.00 
-7.9%   0.036   1.00    1   1.00     0.02   -0.00 
-12.7%  0.052   0.99    1   0.99     0.06   -0.01 
-4.6%   0.049   0.98    1   0.98     0.19   -0.01 
-8.3%   0.051   0.95    1   0.95     0.39   -0.03 
-7.3%   0.052   0.88    1   0.88     1.00   -0.07 
-9.2%   0.055   0.69    1   0.69     2.31   -0.21 
-7.9%   0.055   0.38    1   0.38     5.63   -0.44 
-2.2%   0.051   0.29    1   0.29     11.13  -0.24 ", na.strings = "-")
# convert percent string to numeric
dt[, col_A := readr::parse_number(col_A) / 100]