I have a pretty complicated model with many parameters that I need to solve. Even though the model is complicated, the functional form at each step is not irregular.
I'm seeing some strange behaviors with start values. If I start at standard, random values (all 0s), the solver converges with "Locally optimal solution found", 0 CG iterations, in 673s.
If I start at values that I know are close to the solutions, the solver converges with "Primal feasible solution estimate cannot be improved.", 493 CG iterations, in 1718s.
Note that in both cases, the final values are the same (or very similar).
2 questions:
- What exactly is the number of conjugate gradient iterations, as in, when does the solver need to calculate the conjugate gradient? Here in 1 case I see 0 CG iteration, and in the other case 493 CG iterations. What does that imply? (Note that I do know what CG method is, just not sure why the huge difference here with 0 in one case.)
- What are all the possible explanations that 'better' initial values can slow optimization convergence significantly?

