How to simulate bounds for minimizers that do not foresee bounds?

Question

I have an optimization problem where I want to use the scipy.optimize.minimize package, namely one of the methods that do not foresee any constraints or bounds, like 'nelder-mead', 'powell', 'cg', 'bfgs', newton-cg', 'dogleg', 'trust-ncg' - see: minimize exception about bounds and constraints.

I would like to introduce bounds by myself by adding a wrapper around my cost function:

This wrapper gets called by the optimizer and passes the parameters to the cost function I would like to optimize. The cost function than returns a value to the wrapper.

If the optimizer has called the wrapper with a parameter that is outside of the allowed range of the parameter, the wrapper afterwards adds some extra 'penalty' to the result of the cost function and passes the sum of the penalty and the result of the cost function back to the optimizer.

My question is:

Will the optimization work properly (that means: will it find my absolute minimum of my cost function with a higher probability), if I have a binary penalty function like this one (only shown for the upper limit of the bound):

if parameter > upperBound:
    penalty = 1.e6
else:
    penalty = 0.0

Or is it better to use a penalty function that works smoother (to simulate something like a lipschitz continuous function), like e. g. a shifted square function:

if parameter > upperBound:
    penalty = VERY_HIGH_NUMBER * (parameter - upperBound)**2
else:
    penalty = 0.0

If it is better to use a smoother penalty function, are there any 'best practizes' for these kind of functions (like using a sigmoid function or a square function etc.)?

Why the algorithm (subset) selection like this? Why use bfgs (no bounds), but not use lbfgs which supports bounds (in a more sound way than you propose)? Why not distinguish between no-gradient (nelder-mead, powell?) and gradient-based methods? A binary penalty will invalidate the assumptions of all gradient-based solvers (in general: twice differentiable). The smooth approach is better (but HIGH-NUMBER might introduce numerical problems), but details depend on the chosen solver and absolute minimum is only guaranteed with a convex opt-surface (while not invalidating differentiability). — sascha
Thanks for your comment. You are right that one could distinguish the different methods further. My observation was that some of the unbound solvers work really well on my cost function therefore I would like to know what a general solution would look like to use bounds for them, too. — user7468395

Sam Mason Sam Mason · Accepted Answer · 2019-08-28T19:20:22

I'd normally transform the parameter space such that -Inf corresponds to the lowest value and +Inf the highest, e.g. a sigmoid function can be used if it's bounded on the top and bottom, or the "log of 1+exp" (not sure of the canonical name for this) if you just want something to be positive. naive versions would be:

def sigfn(x):
  return 1 / (1 + exp(-x))

def logexpfn(x):
  return log1p(exp(x))

but you might need to be careful to handle large values specially, as values above ~700 values cause exp to overflow. so the functions can be expanded to:

def sigfn(x):
  if x < -36:
    return exp(x)
  return 1 / (1 + exp(-x))

def logexpfn(x):
  if x > 36:
    return x
  return log1p(exp(x))

I use 36 as this is the point at which we're losing precision, i.e. exp(37) * float_info.epsilon > 1.

if you just want an upper limit, you could use these to do:

def optfn(param):
  param = upperValue - logexpfn(-param)

or whatever make sense to your code. this obviously makes things a bit more awkward than just using things directly, but it's a very general technique and generally applicable

How to simulate bounds for minimizers that do not foresee bounds?

1 Answers