2
votes

I am looking for some way of iterating over all possible combinations of columns and rows in a numerical dataframe. So it could possibly look like this (just a few of the many possible combinations there could be):

  • 1st iteration: Column A + Row 1
  • 2nd iteration: Column B + Row 1
  • 3rd iteration: Column A + Column B + Row 1
  • 4th iteration: Column A + Column B + Row 1 + Row 2
  • and so on and on...

For each combination of columns and rows a simple mathematical calculation shall be carried out and its result shall be stored to a dataframe result. This way I want to eventually find the combination of columns and rows that yields the highest/lowest calculation result.

So my code looks like this (with respect to the calculation):

calc = sum(sum(colSums(data)) + sum(rowSums(data)) / (nrow(data) * ncol(data)))

So my questions are:

  1. How do I create this iterating process in r-code, i.e. the process of trying all possible combinations? I thought of using two nested for()-loops, but I am not sure wether this will work (especially how do I address the columns/rows without knowing their names and their number)
  2. How can I finally store all the results to a single dataframe result? result should contain the calculation result and the respective combination of columns and rows.

Do you have any ideas how I could solve this?

Here is some data to play around with:

data = structure(list(GDP = c(18.2, 8.5, 54.1, 1.4, 2.1, 83.6, 17), 
    Population = c(1.22, 0.06, 0, 0.54, 2.34, 0.74, 1.03), Birth.rate = c(11.56, 
    146.75, 167.23, 7, 7, 7, 10.07), Income = c(54, 94, 37, 95, 
    98, 31, 78), Savings = c(56.73, 56.49, 42.81, 70.98, 88.24, 
    35.16, 46.18)), .Names = c("GDP", "Population", "Birth.rate", 
    "Income", "Savings"), class = "data.frame", row.names = c(NA, 
    -7L))
1
Not much clear to me. What do you mean by combinations of columns and rows? Could you provide examples?nicola
i do not see the point from what you described. Since you look for the max and all number are positive (is it always the case?), you sum your entire columns and rows with sum(data). For the min, you just need to select one row and one column and do this for all the possible combinations: min(rowSums(expand.grid(colSums(data), rowSums(data))))Colonel Beauvel
@Colonel Beauvel: I am eventually needing it for a kmeans clustering, the "calc"- code I provided is just a dummy. For kmeans clustering it the choice of columns and rows is very important, does it become clearer?Jonathan Rhein
@nicola: I edited my question, does it become clearer?Jonathan Rhein
I'm not sure you are aware of the complexity of what you are trying to do. Anyway, I'll answer my own question. The number of your combos are (2^ncol(data)-1)*(2^nrow(data)-1). If data has something like 30 rows, that number become huge. Anyway, to get started try unlist(lapply(seq_along(data),function(x) combn(data,x,simplify=FALSE)),recursive=FALSE). It will give you all combos of just columns. For each of the element of the list, you have to choose the rows.nicola

1 Answers

1
votes

I am not fully following what we are trying to achieve, maybe this is a start:

library(data.table)

cc <- 1:ncol(data)
rr <- 1:nrow(data)

rbindlist(
  lapply(cc, function(i){
    ccN <- combn(cc, i)
    rbindlist(
      apply(ccN, 2, function(iN){
        rbindlist(
          lapply(rr, function(j){
            rrN <- combn(rr, j)
            rbindlist(
              apply(rrN, 2, function(jN){
                data.frame(
                  Sum = sum(c(
                    unlist(data[jN, ]),
                    unlist(data[, iN]))),
                  Desc = paste(c("rows",jN,"cols",iN), collapse = ",")
                )
              })
            )
          })
        )
      })
    )
  })
)


#          Sum                              Desc
#   1:  326.61                     rows,1,cols,1
#   2:  490.70                     rows,2,cols,1
#   3:  486.04                     rows,3,cols,1
#   4:  359.82                     rows,4,cols,1
#   5:  382.58                     rows,5,cols,1
#  ---                                          
#3933: 2687.14   rows,1,2,3,5,6,7,cols,1,2,3,4,5
#3934: 2560.92   rows,1,2,4,5,6,7,cols,1,2,3,4,5
#3935: 2556.26   rows,1,3,4,5,6,7,cols,1,2,3,4,5
#3936: 2720.35   rows,2,3,4,5,6,7,cols,1,2,3,4,5
#3937: 2862.06 rows,1,2,3,4,5,6,7,cols,1,2,3,4,5