0
votes

I have a data.table like so:

dt = data.table(id_1 = c(rep(1:3, 5)), id_2 = sort(rep(c('A', 'B', 'C'), 5)), value_1 = rnorm(15, 1, 1), value_2 = rpois(15, 1))

I would like to create a function which groups the table by some columns specified by the function parameter and performs action (let's say sum) to several other columns specified by another parameter. Finally, i'd like to specify names for the new columns as another function parameter. My problem is: i dont really know how to create names from character vector when i am not using the assignment by reference :=.

The following two approaches achieve exactly what i want to do, i just don't like the way:

Approach one: use the assignment by reference and then choose only one record per group (and forget original columns)

dt_aggregator_1 <- function(data,
                          group_cols = c('id_1', 'id_2'),
                          new_names = c('sum_value_1', 'sum_value_2'),
                          value_cols = c('value_1', 'value_2')){
  data_out = data
  data_out[,(new_names) := lapply(.SD, function(x){sum(x)}),by = group_cols, .SDcols = value_cols]
  data_out[,lapply(.SD, max), by = group_cols, .SDcols = new_names]
}

Approach 2: rename columns after grouping. I assume this is way better approach.

dt_aggregator_2 <- function(data,
                            group_cols = c('id_1', 'id_2'),
                            new_names = c('sum_value_1', 'sum_value_2'),
                            value_cols = c('value_1', 'value_2')){
  data_out = data[,lapply(.SD, function(x){sum(x)}),by = group_cols, .SDcols = value_cols]
  setnames(data_out, value_cols, new_names)
  data_out[]
}

My question is, if in approach number 2 i can somehow set the names while performing the grouping opperation? So that i would reduce it to one line of code instead of 2:)

2
Actually, i start to like the second approach quite a bit, but still want to know the answer how to make it in one line :) - ira

2 Answers

1
votes

you can try with dplyr library

library(dplyr)

dt1 <- dt %>% group_by(id_1,id_2) %>%
  summarise(
    sum_value_1 = sum(value_1),
    sum_value_2 = sum(value_2)
  )

dt1
1
votes

You can include setNames in the same line and make this one-liner.

dt_aggregator_2 <- function(data,
                            group_cols = c('id_1', 'id_2'),
                            new_names = c('sum_value_1', 'sum_value_2'),
                            value_cols = c('value_1', 'value_2')){

  dt[,setNames(lapply(.SD, sum), new_names),by = group_cols, .SDcols = value_cols]

}