Currently I have the following data.table :
item city dummyvar
A Austin 1
A Austin 1
A Austin 100
B Austin 2
B Austin 2
B Austin 200
A NY 1
A NY 1
A NY 100
B NY 2
B NY 2
B NY 200
and I have a user-defined function called ImbalancePoints
, which is applied to dummyvar
and it returns the rows where it detects an abrupt change in dummyvar
. The way I am doing this is as follows:
my.data.table[,
.(item, city , imb.points = list(unique(try(ImbalancePoints(dummyvar), silent = T))) ),
by = .(city, item)
]
And for the NY
case lets say that I get a data.table
object like the following:
item city imb.points
A NY 3,449
where the column imb.points
is a column with nested lists as its elements, and for this example the numbers 3 and 449 denote the rows where there is an abrupt change for the case of city = NY
and item = A
. However the problem that I am facing is that I have approx. 3000 different items for 12 different cities, and it is taking a long time to calculate this. I was wondering if you could give me an idea of how to vectorize/speed up this calculation since the last time that I attempted this it took almost 2 hours and it didn't finish.
I don't know if its of any help but I am also attaching the ImbalancePoints
function:
library(pracma)
ImbalancePr <- function(eval.column) {
n <- length(eval.column)
imbalance <- rep(0, n)
b_t = rep(0,n)
elem_diff <- diff(eval.column)
for(i in 2:n)
{
imbalance[i] <- sign(elem_diff[i-1]) * (elem_diff[i-1] != 0)
+ imbalance[i-1]*(elem_diff[i-1] == 0)
}
return(imbalance)
}
ImbalancePoints <- function(eval.column, w0 = 100, bkw_T = 10, bkw_b = 10){
bv_t <- ImbalancePr(eval.column)
w0 <- min(min(which(cumsum(bv_t) != 0)), w0)
Tstar <- w0
E0t <- Tstar
repeat{
Tlast <- sum(Tstar)
nbt <- min(bkw_b, Tlast-1)
P <- pracma::movavg(bv_t[1:Tlast], n = nbt, type = "e")
P <- tail(P,1)
bv_t_expected <- E0t * abs(P)
bv_t_cumsum <- abs(cumsum(bv_t[-(1:Tlast)]))
if(max(bv_t_cumsum) < bv_t_expected){break}else{
Tnew <- min(which(bv_t_cumsum >= bv_t_expected))
}
Tlast <- Tlast + Tnew
if(Tlast > length(eval.column)[1]){break}else{
Tstar <- c(Tstar,Tnew)
if(length(Tstar) <= 2){
E0t <- mean(Tstar)
}else{
nt <- min(bkw_T,length(Tstar)-1)
E0t <- pracma::movavg(Tstar[1:length(Tstar)], n = nt, type = "e")
E0t <- tail(E0t,1)
}
}
}
return(sort(unique(Tstar)))
}
EDIT: Thanks to Paul insight then my problem is just to vectorize the repeat loop inside the ImbalancePoints
function. However I am not very proficient coding and I don't see a straightforward solution to it. If someone could give me a suggestion or if you know about an auxiliary function/library I will appreciate it.
ImbalancePoints
for each row in the data. This works out to be at least 3000 times the number of cities. So that equals 36,000 times. For each time you call this function you callImbalancePr
. This function loops n times through the column. This then works out to 36,000*36,000 = 1,296,000,000 loops. No wonder. Your repeat loop will make this worse. – Paul van Oppen