Subscript out of bounds when using conditional by in data.table

Question

I want to list unique IDs within groups, where the grouping variable can be selected by the user. The following works:

if(useGroupVar1){

  dt[,unique(id),.(group1a,group1b,group1c)]

} else {

  dt[,unique(id),group2]

}

The expressions I'm using in my code to filter rows are actually fairly long so I want to avoid duplicating code. I came up with this "solution", which doesn't actually work:

dt[,unique(id),if(useGroupVar1){.(group1a,group1b,group1c)}else{group2}]

If the condition leads to using group2 alone, it works (though the column is called if), but trying to get it to use .(group1a,group1b,group1c) results in

Error in eval(expr, envir, enclos) : could not find function "."

Now, I read .() is an alias to list(), so using the latter gets me this

Error in bysubl[[jj + 1L]] : subscript out of bounds

Is there a way to implement a conditional by without duplicating entire expressions?

I would do exactly this: by = if (useGroupVar1) paste0('group1', c('a','b','c')) else 'group2') — MichaelChirico
Did that and got this very descriptive error! Error in `[.data.table`(tabla, if (identical(codificacion[[1]][i]$codCIE, : 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(if (!porEESS) { c("cod_dpto", "cod_prov", "cod_dist")} else { cod_2000}) should work. This is for efficiency so data.table can detect which columns are needed. — overdisperse
Or just add a line above your [ call: group_var = if (...) else . Then dt[ , , by = group_var]. — MichaelChirico

talat talat · Accepted Answer · 2017-02-13T15:12:43

Just personal preference, but I don't like pasting strings in a by= statement of a data.table (not very readable to me).

Instead, I would use a user-selected variable (var) and create a list of grouping variables. Then, you can easily select the variables like so:

groupVars <- list(
  GroupVar1 = c("group1a","group1b","group1c"),
  GroupVar2 = c("groupXYZ", "groupABC"),
  GroupVarX = "group2"
)

# user selects that - for example - var = "GroupVar2"

dt[, unique(id), by = groupVars[[var]]]

As a side note:

You can easily extend this kind of variable selection for situations where a user is allowed to select multiple sets of grouping variables. In such cases, you could it as follows:

Assume, that the user-selected variable is now:

var <- c("GroupVar1", "GroupVarX") # two groups selected

Then, the by= statement becomes:

dt[, unique(id), by = unlist(groupVars[var], use.names=FALSE)]

Subscript out of bounds when using conditional by in data.table

1 Answers