Description of the data
My data.frame represents the salary of people living in different cities (city) in different countries (country). city names, country names and salaries are integers. In my data.frame, the variable country is ordered, the variable city is ordered within each country and the variable salary is ordered within each city (and country). There are two additional columns called arg1 and arg2, which contain floats/doubles.
Goal
For each country and each city, I want to consider a window of size WindowSize of salaries and calculate D = sum(arg1)/sum(arg2) over this window. Then, the window slide by WindowStep and D should be recalculated and so on. For example, let's consider a WindowSize = 1000 and WindowStep = 10. Within each country and within each city, I would like to get D for the range of salaries between 0 and 1000 and for the range between 10 and 1010 and for the range 20 and 1020, etc...
At the end the output should be a data.frame associating a D statistic to each window. If a given window has no entry (for example nobody has a salary between 20 and 1020 in country 1, city 3), then the D statistic should be NA.
Note on performance
I will have to run this algorithm about 10000 times on pretty big tables (that have nothing to do with countries, cities and salaries; I don't yet have a good estimate of the size of these tables), so performance is of concern.
Example data
set.seed(84)
country = rep(1:3, c(30, 22, 51))
city = c(rep(1:5, c(5,5,5,5,10)), rep(1:5, c(1,1,10,8,2)), rep(c(1,3,4,5), c(20, 7, 3, 21)))
tt = paste0(city, country)
salary = c()
for (i in unique(tt)) salary = append(salary, sort(round(runif(sum(tt==i), 0,100000))))
arg1 = rnorm(length(country), 1, 1)
arg2 = rnorm(length(country), 1, 1)
dt = data.frame(country = country, city = city, salary = salary, arg1 = arg1, arg2 = arg2)
head(dim)
country city salary arg1 arg2
1 1 1 22791 -1.4606212 1.07084528
2 1 1 34598 0.9244679 1.19519158
3 1 1 76411 0.8288587 0.86737330
4 1 1 76790 1.3013056 0.07380115
5 1 1 87297 -1.4021137 1.62395596
6 1 2 12581 1.3062181 -1.03360620
With this example, if windowSize = 70000 and windowStep = 30000, the first values of D are -0.236604 and 0.439462 which are the results of sum(dt$arg1[1:2])/sum(dt$arg2[1:2]) and sum(dt$arg1[2:5])/sum(dt$arg2[2:5]), respectively.