2
votes

My goal is to generate a unique list of combinations when we know that there may exist a similar combination of variables since part of the set being operated upon has duplicate values. So, the problem I am trying to solve is obtaining all combinations without replacement on non distinct items. The solution needs to be general (i.e. works for any set of N elements with M values of distinct items. So, the solution should work with N = 4, M = 2 with (Var1 = Var2, Var3=Var4) or (Var1 = Var2 = Var3, Var4) etc.). As a simple example that I am trying to do, take three variables: X,Y,Z

Classic Combinations are:

X    Y    Z
Y    Z
X    Z
Z
X    Y 
Y  
X

If we let X = Y, then we have:

X    X    Z
X    Z
X    Z
Z
X    X
X
X

Thus, we have two combinations that are not "unique": (X) and (X Z).

So, the list that I would want is:

X    X    Z
X    Z
Z
X    X
X

Edit: Added case for when N=4 as recommended by @Sam Thomas

If we expand this to N=4, we have: W,X,Y,Z

W    X    Y    Z
X    Y    Z
W    Y    Z
Y    Z
W    X    Z
X    Z
W    Z
Z
W    X    Y
X    Y
W    Y
Y
W    X
X
W

Here, we can have M=2 distinct elements in forms of either: (W=X, Y=Z), (X=Z,W=Y), (X=Y,W=Z), (W = X = Y, Z), (W = Z = Y, X), (W = Z = X, Y), or (X = Y = Z, W).

In the case of (W=X, Y=Z), we have:

W    W    Y    Y
W    Y    Y
W    Y    Y
Y    Y
W    W    Y
W    Y
W    Y
Y
W    W    Y
W    Y
W    Y
Y
W    W
W
W

The output should be:

W    W    Y    Y
W    Y    Y
Y    Y
W    W    Y
W    Y
Y
W    W
W

In the case of, (W = X = Y, Z) the matrix would initially look like:

W    W    W    Z
W    W    Z
W    W    Z
W    Z
W    W    Z
W    Z
W    Z
Z
W    W    W
W    W
W    W
W
W    W
W
W

The desired output would be:

W    W    W    Z
W    W    Z
W    Z
Z
W    W    W
W    W
W

End Edit

Using R, I already have a way to generate a list of all possible combinations in binary matrix form:

comb.mat = function(n){
     c = rep(list(1:0), n)
     expand.grid(c)
}

comb.mat(3)

This gives:

  Var1 Var2 Var3
1    1    1    1
2    0    1    1
3    1    0    1
4    0    0    1
5    1    1    0
6    0    1    0
7    1    0    0
8    0    0    0

If we consider Var1 = Var2, this structure would have redundancies. e.g. lines (2,3) and then (6,7) would represent the same object. Thus, the redundancy free version would be:

  Var1 Var2 Var3
1    1    1    1
2    0    1    1
4    0    0    1
5    1    1    0
6    0    1    0
8    0    0    0

To add "variable" values similar to the initial structure, I use:

nvars = ncol(m)

for(i in 1:nvars){
  m[m[,i]==1,i] = LETTERS[22+i]
}

To modify it so that Var1 = Var2, I just use:

  m[m[,i]=="Y",i] = "X"

Any suggestions on how I could move from the initial matrix to the later matrix?

Especially if we have more variables that are paired?

E.g. comb.mat(4), with: (Var1 = Var2, Var3 = Var4) or (Var1=Var2=Var3, Var4)

1
I think- see ?combnAlex W
combn does not give the right structure e.g. combn(c("X","Y","Z"), 2) =>[["X" , "X", "Y"], ["Y", "Z", "Z"]] Notice that X is repeated even though it is supplied only once. Similarly, Z is repeated.coatless
Might help to show the result you are looking for in your updated example with comb.mat(4)Whitebeard
Done! Thanks for the suggestion @SamThomascoatless

1 Answers

2
votes

This has all of the combinations, I believe.

m <- comb.mat(3)

res <- lapply(split(m, m$Var3), function(x, vars=c("Var1", "Var2")) {
   x[Reduce(`==`, x[vars]) | cumsum(Reduce(xor, x[vars])) == 1, ]
})

do.call(rbind, res)
    Var1 Var2 Var3
0.5    1    1    0
0.6    0    1    0
0.8    0    0    0
1.1    1    1    1
1.2    0    1    1
1.4    0    0    1

Edit: Think this works for multiple equivalent variables- couldn't figure out a method without a for loop. I'm sure there's a way with Reduce somehow.

And I think this gives the right combination of results, but if not let me know as it's late in the day and I'm a bit tired.

remove_dups <- function(m, vars) {
  for (k in 1:length(vars)) {
      res <- lapply(split(m, m[, !names(m) %in% vars[[k]]]), function(x, vn=vars[[k]]) {
        x[Reduce(`==`, x[vn]) | cumsum(Reduce(xor, x[vn])) == 1, ]
     })
     m <- do.call(rbind, res)
  }
  m
}

 m <- comb.mat(4)
 remove_dups(m, list(vars=c("Var1", "Var2"), vars=c("Var3", "Var4")))

           Var1 Var2 Var3 Var4
0.0.0.0.16    0    0    0    0
0.0.1.0.12    0    0    1    0
0.0.1.1.4     0    0    1    1
0.1.0.0.14    0    1    0    0
0.1.1.0.10    0    1    1    0
0.1.1.1.2     0    1    1    1
1.1.0.0.13    1    1    0    0
1.1.1.0.9     1    1    1    0
1.1.1.1.1     1    1    1    1