1
votes

TL;DR: I am trying to find the "cheapest" set of items in a collection that satisfy certain linear constraints. However, every element can be part of multiple "categories" and I also want to have a mix of those unique categories and I'm not quite sure if this can be implemented in a LP way or not and in case how to approach it.

Example - Part 1 Let's say I have 7 items that have different costs and different values associated to them.

library(tidyverse)
library(lpSolve)

# Fake data
kd = tibble(
  Item = 1:7,
  Cost = c(1, 1, 1, 1, 2, 3, 4),
  Value =c(1, 1, 3, 4, 6, 3, 2),
  Type = c("A", "A", "A", "B", "C", "D", "E")
)

I want to pick 3 of those elements so that Cost is minimized and their Value is >= 5. I can easily do this with lp with the following code:

# Objective function
knapsack.obj = kd$Cost

# Constraints
knapsack.con = matrix(
  c(
    rep(1, nrow(kd)), 
    kd$Value 
  ),
  nrow = 2, byrow = TRUE
)
knapsack.dir = c("==", ">=")
knapsack.rhs = c(3, 5)

# Solve
knapsackSolution = lp("min", knapsack.obj, knapsack.con, knapsack.dir, knapsack.rhs, all.bin = TRUE) 

# Results
kd[knapsackSolution$solution == 1, ]

As expected this returns Item 1, 2 and 3 that have a combined Value=5 and are obviously minimizing the price.

Example - Part 2

The extra complication I don't quite know how to solve now is adding code for making sure the items picked they come from at least 2 unique categories. Now the solution I'd expect is Item 1, 2 and 4 (or 1, 3 and 4) that have still a combined cost of 3 and a value of 6 (or 8) that is >= 5 but are not all "A" elements but contain also Item 4 that is a "B" element.

Any idea on how to implement this in a LP framework?

3

3 Answers

1
votes

Mathematical Model

If we introduce a zero-one (data) matrix

Category[i,j] = 1  if item i has type j
                0  otherwise

and a binary variable:

y[j] = 1 if an item with type j is selected
       0 otherwise

we can develop a simple mathematical model:

enter image description here

The blue colored symbols represent data while the red ones are decision variables.

Note that the variable y[j] can be relaxed to be continuous between 0 and 1.

The advantage of first writing down a mathematical model is that it is easier to reason about than a bunch of R code (at least for me).

Implementation

I use OMPR here for two reasons:

  • Direct way to implement the model in an equation based fashion. We stay closer to the mathematical model.
  • Access to better solvers than LpSolve.

Here is the R code:

library(tidyverse)
library(ROI)
library(ROI.plugin.symphony)
library(ompr)
library(ompr.roi)

# Fake data
kd = tibble(
  Item = 1:7,
  Cost = c(1, 1, 1, 1, 2, 3, 4),
  Value =c(1, 1, 3, 4, 6, 3, 2),
  Type = c("A", "A", "A", "B", "C", "D", "E")
)

Types <- c("A","B","C","D","E")
Category <- 1*outer(kd$Type,Types,FUN="==")
Type <- 1:length(Types)

numItems <- 3
MinValue <- 5
MinItems <- 2

m <- MIPModel() %>%
  add_variable(x[i], i=kd$Item, type="binary") %>%
  add_variable(y[j], j=Type, type="binary") %>%
  add_constraint(sum_expr(x[i], i=kd$Item) == numItems) %>% 
  add_constraint(sum_expr(kd$Value[i]*x[i], i=kd$Item) >= MinValue) %>% 
  add_constraint(y[j] <= sum_expr(Category[i,j]*x[i], i=kd$Item), j=Type) %>% 
  add_constraint(sum_expr(y[j], j=Type) >= MinItems) %>% 
  set_objective(sum_expr(kd$Cost[i]*x[i], i=kd$Item),"min") %>% 
  solve_model(with_ROI(solver = "symphony", verbosity=1))

cat("Status:",solver_status(m),"\n")
cat("Objective:",objective_value(m),"\n")
m$solution

Probably the most complex part here is the calculation of the Category matrix.

Solution

The solution looks like:

Status: optimal 
Objective: 3 
x[1] x[2] x[3] x[4] x[5] x[6] x[7] y[1] y[2] y[3] y[4] y[5] 
   1    1    0    1    0    0    0    1    1    0    0    0          
0
votes

Actually, we do not have to force the solution to have k-1 or fewer elements from each group. We can instead force each group to have at most g_i-1 elements where g_i is the number of elements in each group.

Here is the implementation:

library(purrr)
library(lpSolve)
library(fastmatch)

# Fake data
  kd = tibble(
  Item = 1:7,
  Cost = c(1, 1, 1, 1, 2, 3, 4),
  Value =c(1, 1, 3, 4, 6, 3, 2),
  Type = c("A", "A", "A", "B", "C", "D", "E")
)

# number of elements to choose
k = 3

  type_match <- fmatch(kd$Type, unique(kd$Type))
  unique_cat <- unique(type_match)

  add_con <- map(unique_cat,function(x) {
    type_match[type_match != x] = 0
    type_match[type_match > 0] = 1
    return(type_match)}) %>% 
    do.call(rbind,.)

  knapsack.obj = kd$Cost
  knapsack.con = 
    rbind(
      rep(1, nrow(kd)), 
      kd$Value,
      add_con
    )
rhs_add <- apply(add_con, 1, function(x) ifelse(sum(x)>1,sum(x) - 1,1))

  knapsack.dir = c("==", ">=", rep("<=",length(rhs_add)))
  knapsack.rhs = c(k, 5, rhs_add)

  knapsackSolution = lp("min", 
                        knapsack.obj, 
                        knapsack.con, 
                        knapsack.dir, 
                        knapsack.rhs, 
                        all.bin = TRUE) 
  knapsackSolution$solution
>   knapsackSolution$solution
[1] 1 1 0 1 0 0 0
0
votes

Since we know that the solution must have k = 3 elements restrict each group to have k-1 or fewer elements forcing at least 2 groups to be used.

incid <- +outer(unique(kd$Type), kd$Type, "==")
ntypes <- nrow(incid)

knapsack.con = rbind(
    rep(1, nrow(kd)), 
    kd$Value,
    incid)

k <- 3
knapsack.dir = c("==", ">=", rep("<=", ntypes))
knapsack.rhs = c(k, 5, rep(k-1, ntypes))
res <- lp("min", knapsack.obj, knapsack.con, knapsack.dir, knapsack.rhs, all.bin = TRUE) 

res$status
## [1] 0

res$solution
## [1] 1 1 0 1 0 0 0

Simplification

As we discussed in the comments, for this particular data we can omit the last 4 constraints since they are always satsified as there is only one element in each of the last 4 groups.

res2 <- lp("min", knapsack.obj, knapsack.con[1:3, ], knapsack.dir[1:3], 
  disknapsack.rhs[1:3], all.bin = TRUE) 

res2$status
## [1] 0

res2$solution
## [1] 1 1 0 1 0 0 0

Generalizing

As we discussed in the comments, to generalize let us suppose we want at least 3 different categories in the solution rather than 2. In this particular data we could simply require the solution have no more than 1 of each category but in general that will not work so let us take all combinations of groups 2 at a time and produce the constraints shown below. 5 is the total number of categories in the problem and 2 is one less than the number of categories required to be in the solution.

combos <- combn(5, 2, function(x) colSums(incid[x, ]))

For each of those constraints, i.e. each row in combos, we require that it be less or equal to 2 in order to exclude any solution having only 1 or 2 categories. We then construct the LP in a similar manner as before adding the remaining constraints.