0
votes

I have this data.table with different column types.

I do not know the column names before hand and I would like to generate aggregations only for columns of certain type (say, numeric). How to achieve this with data.table?

For example, consider the below code:

dt <- data.table(ch=c('a','b','c'),num1=c(1,3,6), num2=1:9)

Need to create a function that accepts the above data.table and automatically performs calculations on the numeric fields grouped by the character filed (say sum on num1 and mean on num2, by ch). How to achieve this dynamically?

We can find out the numeric columns using sapply(dt, is.numeric) but it gives column names as strings - not sure how to plug it with data.table. Help is appreciated. Below code gives the idea of what is required - but does not work

DoSomething <- function(dt)
{
    numCols <- names(dt)[sapply(dt, is.numeric)]
    chrCols <- names(dt)[sapply(dt, is.character)]
    dt[,list(sum(numCols[1]), mean(numCols[2])), by=(chrCols), with=F]
}
2

2 Answers

2
votes

You can achieve it using .SDcols argument. See example.

require(data.table)

dt <- data.table(ch=c('a','b','c'), num1=c(1,3,6), num2=1:9)

DoSomething <- function(dt) {
    numCols <- names(dt)[sapply(dt, is.numeric)]
    chrCols <- names(dt)[sapply(dt, is.character)]
    dt[, list(sum(.SD[[1]]), mean(.SD[[2]])), by = chrCols, .SDcols = numCols]
}

DoSomething(dt)
1
votes

@djhurio gives a nice solution to your problem.

.SD and .SDcols in data.table gives what you want.

In case you perform same calculation between different columns, you can try the following code.

require(data.table)

dt <- data.table(ch=c('a','b','c'), num1=c(1,3,6), num2=1:9)

DTfunction <- function(dt){
    numCols <- names(dt)[sapply(dt, is.numeric)]
    chrCols <- names(dt)[sapply(dt, is.character)]
    dt <- dt[, lapply(.SD, mean), by = (chrCols), .SDcols = (numCols)]
}

cute code. Isn't it? :)