I am writing a custom aggregation function with data.table (v 1.9.6) and struggle to pass function arguments to it. there have been similar questions on this but none deals with multiple (variable) inputs and none seems to have a conclusive answer but rather "little hacks".
- pass variables and names to data.table function
- eval and quote in data.table
- How can one work fully generically in data.table in R with column names in variables
I would like to take a data table sum and order defined variables and create new variables on top (2 steps). the crucial think is that everything should be parameterized i.e. variables to sum, variables to group by, variables to order by. and they can all be one or more variables. a small example.
dt <- data.table(a=rep(letters[1:4], 5),
b=rep(letters[5:8], 5),
c=rep(letters[3:6], 5),
x=sample(1:100, 20),
y=sample(1:100, 20),
z=sample(1:100, 20))
temp <-
dt[, .(x_sum = sum(x, na.rm = T),
y_sum = sum(y, na.rm = T)),
by = .(a, b)][order(a, b)]
temp2 <-
temp[, `:=` (x_sum_del = (x_sum - shift(x = x_sum, n = 1, type = "lag")),
y_sum_del = (y_sum - shift(x = y_sum, n = 1, type = "lag")),
x_sum_del_rel = ((x_sum - shift(x = x_sum, n = 1, type = "lag")) /
(shift(x = x_sum, n = 1, type = "lag"))),
y_sum_del_rel = ((y_sum - shift(x = y_sum, n = 1, type = "lag")) /
(shift(x = y_sum, n = 1, type = "lag")))
)
]
how to programmatically pass following function arguments (i.e. not single inputs but vectors/list of inputs):
- x and y --> var_list
- new names of x and y (e.g. x_sum, y_sum) --> var_name_list
- group by arguments a, b --> by_var_list
- order by arguments a, b --> order_var_list
- temp 2 should work on all pre-defined parameters, I was also thinking about using an apply function but again struggled to pass a list of variables.
I have played around with variations of get(), as.name(), eval(), quote() but as soon as I pass more than one variable, they don't work anymore. I hope the question is clear, otherwise I am happy to adjust where you deem necessary. a function call would look as follows:
fn_agg(dt, var_list, var_name_list, by_var_list, order_var_list)