Better initial guess can be worse for optimization convergence

Question

I have a pretty complicated model with many parameters that I need to solve. Even though the model is complicated, the functional form at each step is not irregular.

I'm seeing some strange behaviors with start values. If I start at standard, random values (all 0s), the solver converges with "Locally optimal solution found", 0 CG iterations, in 673s.

If I start at values that I know are close to the solutions, the solver converges with "Primal feasible solution estimate cannot be improved.", 493 CG iterations, in 1718s.

Note that in both cases, the final values are the same (or very similar).

2 questions:

What exactly is the number of conjugate gradient iterations, as in, when does the solver need to calculate the conjugate gradient? Here in 1 case I see 0 CG iteration, and in the other case 493 CG iterations. What does that imply? (Note that I do know what CG method is, just not sure why the huge difference here with 0 in one case.)
What are all the possible explanations that 'better' initial values can slow optimization convergence significantly?

Jonas Jonas · Accepted Answer · 2012-04-21T17:24:43

From your first question, we learn that you're employing a "smart" solver, i.e. that dynamically adjusts the algorithm to optimally converge. The conjugate gradient method is a good way for "long-range" finding of the optimum, but is slow to converge when you're close to a shallow optimum.

As with all "smart" code, there are situations where the heuristic fails, and you've encountered one. I assume that your optimum is rather shallow, so that the objective function (i.e. the actual criterion you're trying to optimize) varies little if your parameters change a bit. Now there's no way for the solver to know that the parameters are already very close to the optimum. For all it knows it could be very far from the solution in an area where the objective function is pretty flat. After some initial tests, thus defaults to the conjugate gradient method, which is a slow but safe way to approach the optimum. However, since after a lot of searching it doesn't actually get very far, it tells you that if you were lucky, you started close to the optimum, but if you were unlucky, your solution is far, far away from the optimal solution.

If you know that your initial guess will be pretty good, you may thus want to check whether your solver allows specifying which algorithms should/should not be used.

Better initial guess can be worse for optimization convergence

3 Answers