My goal is to generate a unique list of combinations when we know that there may exist a similar combination of variables since part of the set being operated upon has duplicate values. So, the problem I am trying to solve is obtaining all combinations without replacement on non distinct items. The solution needs to be general (i.e. works for any set of N elements with M values of distinct items. So, the solution should work with N = 4, M = 2 with (Var1 = Var2, Var3=Var4) or (Var1 = Var2 = Var3, Var4) etc.). As a simple example that I am trying to do, take three variables: X,Y,Z
Classic Combinations are:
X Y Z
Y Z
X Z
Z
X Y
Y
X
If we let X = Y, then we have:
X X Z
X Z
X Z
Z
X X
X
X
Thus, we have two combinations that are not "unique": (X) and (X Z).
So, the list that I would want is:
X X Z
X Z
Z
X X
X
Edit: Added case for when N=4 as recommended by @Sam Thomas
If we expand this to N=4, we have: W,X,Y,Z
W X Y Z
X Y Z
W Y Z
Y Z
W X Z
X Z
W Z
Z
W X Y
X Y
W Y
Y
W X
X
W
Here, we can have M=2 distinct elements in forms of either: (W=X, Y=Z), (X=Z,W=Y), (X=Y,W=Z), (W = X = Y, Z), (W = Z = Y, X), (W = Z = X, Y), or (X = Y = Z, W).
In the case of (W=X, Y=Z), we have:
W W Y Y
W Y Y
W Y Y
Y Y
W W Y
W Y
W Y
Y
W W Y
W Y
W Y
Y
W W
W
W
The output should be:
W W Y Y
W Y Y
Y Y
W W Y
W Y
Y
W W
W
In the case of, (W = X = Y, Z) the matrix would initially look like:
W W W Z
W W Z
W W Z
W Z
W W Z
W Z
W Z
Z
W W W
W W
W W
W
W W
W
W
The desired output would be:
W W W Z
W W Z
W Z
Z
W W W
W W
W
End Edit
Using R, I already have a way to generate a list of all possible combinations in binary matrix form:
comb.mat = function(n){
c = rep(list(1:0), n)
expand.grid(c)
}
comb.mat(3)
This gives:
Var1 Var2 Var3
1 1 1 1
2 0 1 1
3 1 0 1
4 0 0 1
5 1 1 0
6 0 1 0
7 1 0 0
8 0 0 0
If we consider Var1 = Var2, this structure would have redundancies. e.g. lines (2,3) and then (6,7) would represent the same object. Thus, the redundancy free version would be:
Var1 Var2 Var3
1 1 1 1
2 0 1 1
4 0 0 1
5 1 1 0
6 0 1 0
8 0 0 0
To add "variable" values similar to the initial structure, I use:
nvars = ncol(m)
for(i in 1:nvars){
m[m[,i]==1,i] = LETTERS[22+i]
}
To modify it so that Var1 = Var2, I just use:
m[m[,i]=="Y",i] = "X"
Any suggestions on how I could move from the initial matrix to the later matrix?
Especially if we have more variables that are paired?
E.g. comb.mat(4), with: (Var1 = Var2, Var3 = Var4) or (Var1=Var2=Var3, Var4)
?combn
– Alex Wcombn
does not give the right structure e.g. combn(c("X","Y","Z"), 2) =>[["X" , "X", "Y"], ["Y", "Z", "Z"]] Notice that X is repeated even though it is supplied only once. Similarly, Z is repeated. – coatlesscomb.mat(4)
– Whitebeard