1
votes

I have the following example dataframe in R:

Var1 = c("Shirt1", "Shirt2", "Shirt3", "Shirt4", "Pants1", "Pants2", "Pants3", "Jacket1", "Jacket2", "Jacket3")
Var2 = c(1,4,3,4,6,5,2,3,6,2)
Bin = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C")
MyData = data.frame(Var1, Var2, Bin)

Which looks like this:

    Var1     Var2 Bin
1   Shirt1   1    A
2   Shirt2   4    A
3   Shirt3   3    A
4   Shirt4   4    A
5   Pants1   6    B
6   Pants2   5    B
7   Pants3   2    B
8  Jacket1   3    C
9  Jacket2   6    C
10 Jacket3   2    C

So there are 3 bins and within each bin there are a variable number of Var1 entries. Note there are many more rows (and bins) in this data frame, but simplifying here. A combination consists of one entry of Var1 from each bin. I would like to find all possible combinations of Var1 across all bins and then sum Var2 for the combination. The output would look like something to this effect:

Bin A   Bin B   Bin C     Sum
Shirt1  Pants1  Jacket1   10
Shirt2  Pants1  Jacket1   13
Shirt2  Pants2  Jacket1   12
Shirt3  Pants1  Jacket2   15
Etc.

While I've seen somewhat similar threads, this problem seems to be unique enough, at least based on my level of proficiency in R.

Any suggestions are greatly appreciated, thank you!

2

2 Answers

1
votes

We can create a dataframe with all possible combinations

temp <- do.call(expand.grid, split(MyData$Var1, MyData$Bin))
temp 
#        A      B       C
#1  Shirt1 Pants1 Jacket1
#2  Shirt2 Pants1 Jacket1
#3  Shirt3 Pants1 Jacket1
#4  Shirt4 Pants1 Jacket1
#5  Shirt1 Pants2 Jacket1
#6  Shirt2 Pants2 Jacket1
#....

and then for each row subset relevant data and sum Var2 values.

temp$Sum <- apply(temp, 1, function(x) sum(MyData$Var2[MyData$Var1 %in% x]))
temp
#        A      B       C Sum
#1  Shirt1 Pants1 Jacket1  10
#2  Shirt2 Pants1 Jacket1  13
#3  Shirt3 Pants1 Jacket1  12
#4  Shirt4 Pants1 Jacket1  13
#5  Shirt1 Pants2 Jacket1   9
#6  Shirt2 Pants2 Jacket1  12
#....
0
votes

You could use rowsum as follows:

a <- expand.grid(split(MyData$Var1, MyData$Bin))

cbind(a, Sum = rowsum(MyData$Var2[unlist(a)], c(row(a))))

        A      B       C Sum
1  Shirt1 Pants1 Jacket1  10
2  Shirt2 Pants1 Jacket1  13
3  Shirt3 Pants1 Jacket1  12
4  Shirt4 Pants1 Jacket1  13
5  Shirt1 Pants2 Jacket1   9
....