0
votes

Given are a group indicator variable and some values within groups:

group = rep(c(1,2), each = 3)
val   = letters[1:6]

cbind(group, val)

     group val
[1,] "1"   "a"
[2,] "1"   "b"
[3,] "1"   "c"
[4,] "2"   "d"
[5,] "2"   "e"
[6,] "2"   "f"

I am looking for a matrix giving me all unique combinations that result from combining one element from each group with one element from each other group. That is, only one element per group is allowed to be ''active'' in each combination.

The desired output is a matrix where each column represents one of the possible combinations. The first four columns of the result matrix may look like this:

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    0    1    0    0
[3,]    0    0    1    0
[4,]    1    1    1    0
[5,]    0    0    0    1
[6,]    0    0    0    0

where the rows corresponds to the rows given in the input matrix above. The first column tells you that a is active in group 1 and d is active in group 2. The second column tells you that b is active in group 1 and d is active in group 2. The third column tells you that c is active in 1 and d is active in 2 and so on. Hence, the sum of each column will always be equal to the number of groups, because only one element per group is allowed to be active.

I'm a bit puzzled as to how to obtain the desired output matrix in an organized fashion. I've been thinking of enumerating all possible combinations and restricting to feasible ones (where the sum of the resulting vector elements within groups is exactly equal to one for all groups), but this may cause memory problems for large data sets and I am unsure whether there is a more elegant and efficient approach.

Edit: The solution should generalize to an arbitrary number of groups and an arbitrary number of elements (>1) within groups.

1
I want to make sure I have a handle on the actual use case - are there just 2 groups or does this need to scale up to n groups? Does each group have the same number of values, or could there be different numbers of values per group?Gregor Thomas
Very good point, this should generalize to an arbitrary number of groups and elements within groups. Sorry for the misleading example!mrz1702
One other comment - don't use cbind automatically - it often tries to create matrices. Notice how your group numbers are quoted and character class now. In many cases like this one, data.frame is preferable. input = data.frame(group, val)Gregor Thomas

1 Answers

2
votes

I think this should scale as needed:

group = rep(c(1,2), each = 3)
val   = letters[1:6]
input = data.frame(group, val)

combos = do.call(expand.grid, split(input$val, input$group))

combo_matrix = matrix(0, nrow = nrow(input), ncol = nrow(combos))
for(i in 1:ncol(combos)) {
  combo_matrix[cbind(match(combos[[i]], input$val), 1:ncol(combo_matrix))] = 1
}

combo_matrix
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,]    1    0    0    1    0    0    1    0    0
# [2,]    0    1    0    0    1    0    0    1    0
# [3,]    0    0    1    0    0    1    0    0    1
# [4,]    1    1    1    0    0    0    0    0    0
# [5,]    0    0    0    1    1    1    0    0    0
# [6,]    0    0    0    0    0    0    1    1    1

It does assume that val values are not repeated in the input data frame.