1
votes

I have a large list of column names (variables) of an R data.table and I want to create a column containing the product of these columns.

Example:

col_names <- c("season_1","season_2","season_3")
DT_example <- data.table(id=1:4,
                 season_1=c(1,1,0,0),
                 season_2=c(0,1,1,1),
                 season_3=c(1,0,1,0),
                 product=1)

data.table:

   id season_1 season_2 season_3 product
1:  1        1        0        1       1
2:  2        1        1        1       1
3:  3        0        1        1       1
4:  4        0        1        0       1

The solution I have is using a "for" loop but it is not very efficient:

for(inc in col_names){
  nm1 <- as.symbol(inc)
  DT_example[,product:= product * eval(nm1)]
}

result:

   id season_1 season_2 season_3 product
1:  1        1        0        1       0
2:  2        1        1        0       0
3:  3        1        1        1       1
4:  4        0        1        0       0

Is there a faster way to do this using data.table native syntax?

3
DT_example[, product := Reduce("*", .SD), .SDcols = col_names] - Roland
Example data shown and code to reproduce it don't match. - sindri_baldur

3 Answers

3
votes

Here are four options. The first one is by far the most efficient but assumes we are dealing with only zeros and ones.

DT_example[, product := do.call(pmin, .SD), .SDcols = patterns("season")]

DT_example[, product := Reduce(`*`, .SD), .SDcols = patterns("season")]

DT_example[, product := apply(.SD, 1, prod), .SDcols = patterns("season")]

DT_example[, product := melt(.SD, id.vars = "id")[, prod(value), by = id]$V1]

# > DT_example
#    id season_1 season_2 season_3 product
# 1:  1        1        0        1       0
# 2:  2        1        1        1       1
# 3:  3        0        1        1       0
# 4:  4        0        1        0       0

Data:

DT_example <- data.table(
  id=1:4,
  season_1=c(1,1,0,0),
  season_2=c(0,1,1,1),
  season_3=c(1,1,1,0),
  product=1
)
0
votes

I think you could use "apply" and "prod" functions:

DT_example$product = apply(DT_example[,2:4], 1, prod)

This is applying the "prod" function (multiplies every element of what ir receives), to every line (defined by the "1" argument, as "2" would be column), of "DT_example[,2:4]".

0
votes

We can use prod grouped by sequence of rows after selecting the columns in .SDcols. With prod, there is na.rm option as well to remove NA elements if needed.

DT_example[,  Product := prod(.SD, na.rm = TRUE), by = 1:nrow(DT_example),
     .SDcols = patterns("season")]

-output

DT_example
#   id season_1 season_2 season_3 product Product
#1:  1        1        0        1       1       0
#2:  2        1        1        1       1       1
#3:  3        0        1        1       1       0
#4:  4        0        1        0       1       0