3
votes

I want to solve a mixed integer linear program with the following objective function:

J = maximize (f1(x) + f2(x)) subject to constraint: cost(x) <= threshold

where x is the set of selected variables, f1 and f2 are two scoring functions and cost is the cost function.

f2 is a function based on similarity between the selected variables. I don't know how to formulate this function in pulp.

This is my minimal working example in which function f2 is the similarity between two ingredients and I want to add similarity[i][j] to the objective function if j is already in selected variables, but don't know how to do it.

import numpy as np
import pulp
threshold = 200
model = pulp.LpProblem('selection', pulp.LpMaximize)
similarity = np.array([[1., 0.08333333, 0.1, 0., 0., 0.0625],
                       [0.08333333, 1., 0.33333333,
                           0., 0.11111111, 0.07692308],
                       [0.1, 0.33333333, 1., 0.2, 0., 0.09090909],
                       [0., 0., 0.2, 1., 0., 0.],
                       [0., 0.11111111, 0., 0., 1., 0.27272727],
                       [0.0625, 0.07692308, 0.09090909, 0., 0.27272727, 1.]])
ingredients = ['var_%d' % i for i in range(6)]
scores = np.random.randint(1, 3, size=len(ingredients))
costs = np.random.randint(20, 60, len(ingredients))
scores = dict(zip(ingredients, scores))
costs = dict(zip(ingredients, costs))
x = pulp.LpVariable.dict(
    'x_%s', ingredients, lowBound=0, upBound=1, cat=pulp.LpInteger)
model += sum([scores[i] * x[i] for i in ingredients])
model += sum([costs[i] * x[i] for i in ingredients]) <= threshold
solver = pulp.solvers.PULP_CBC_CMD()
model.solve(solver)

This code basically considers only static costs (encoded in costs variable). How can I dynamically add similarity costs that are the similarity variable?

1
What does the similarity array representdassouki
It is basically a matrix that returns the similarity between the elements. That is the (i,j) th element of this matrix is the similarity between ingredient i and ingredient j. @dassoukiCentAu

1 Answers

3
votes

I believe what you want to do is add an interaction term that essentially says that when both ingredients i and j are selected, there is an extra cost associated with the existence of both i and j, which is described in the similarity matrix. I am going to assume (as it is in your case) that similarity is a symmetric matrix, since the ordering of i and j does not matter (it only matters if both are selected or not).

A naive formulation would be to add the term selected[i, j] * x[i] * x[j] to the objective. This would render the problem non-linear, and although its structure is not prohibitively difficult, there is a common modelling trick to keep the model linear. Here it is.

We define a new set of variables y_{ij} that equal 1 only if both i and j participate in the solution. Note that we can define them so that i>j or j<i since we do not really care about the ordering. The we impose the restrictions:

y_{ij} <= x_i
y_{ij} <= x_j
y_{ij} >= x_i + x_j - 1

This set of restrictions guarantee that y_{ij} equals 1 only when both x_i and x_j equal 1, which is what we want.

An implementation on your code:

import numpy as np
import pulp
from itertools import product
threshold = 200
model = pulp.LpProblem('selection', pulp.LpMaximize)
similarity = np.array([[1., 0.08333333, 0.1, 0., 0., 0.0625],
                       [0.08333333, 1., 0.33333333,
                           0., 0.11111111, 0.07692308],
                       [0.1, 0.33333333, 1., 0.2, 0., 0.09090909],
                       [0., 0., 0.2, 1., 0., 0.],
                       [0., 0.11111111, 0., 0., 1., 0.27272727],
                       [0.0625, 0.07692308, 0.09090909, 0., 0.27272727, 1.]])
ingredients = ['var_%d' % i for i in range(6)]

ingredient_pairs = ['var_{}_{}'.format(
    ingredients.index(var[0]), ingredients.index(var[1])) 
    for var in product(ingredients, ingredients) 
    if ingredients.index(var[0]) > ingredients.index(var[1])]  
# Flatten the similarity array
indices = np.triu_indices_from(similarity)
similarity = similarity[indices]

scores = np.random.randint(1, 3, size=len(ingredients))
costs = np.random.randint(20, 60, len(ingredients))
scores = dict(zip(ingredients, scores))
costs = dict(zip(ingredients, costs))
similarity = dict(zip(ingredient_pairs, similarity))
x = pulp.LpVariable.dict(
    'x_%s', ingredients, lowBound=0, upBound=1, cat=pulp.LpInteger)
y = pulp.LpVariable.dict(
    'y_%s', ingredient_pairs, lowBound=0, upBound=1, cat=pulp.LpInteger)
model += sum([scores[i] * x[i] for i in ingredients]) + sum([
    similarity[i] * y[i] for i in ingredient_pairs])
model += sum([costs[i] * x[i] for i in ingredients]) <= threshold
for pair in ingredient_pairs:
    indexes = pair.split('_')[1:]
    for index in indexes:
        # y_{ij} <= x_i and y_{ij} <= x_j Q
        model += y[pair] <= x['var_{}'.format(index)]
    # y_{ij} >= x_i + x_j - 1
    model += y[pair] >= sum(x['var_{}'.format(i)] for i in indexes) - 1
solver = pulp.solvers.PULP_CBC_CMD()
model.solve(solver)
model.writeLP('similarity.lp')
print 'Objective: {}'.format(pulp.value(model.objective))
for v in model.variables():
    if v.varValue > 10e-4:
        print v.name, v.varValue

I hope this helps.