What is curve_fit's algorithm when fitting data to a post processed non-linear model?

Question

I am new to programming tools and Python, and have little knowledge on numerical calculation. I tried to understand the source code of Curve_fit, but it is too hard for me.

If we are curve fitting a non-linear model straight away, by calculating the Hessian matrix, I guess we are able to find the critical point for all parameters, disregard to local or global, and minimum or maximum.

However, if the function that contains the model is not able to be differentiated, how does the curve_fit actually compute the answer?

I imagine the algorithm is finding the local minimum of Chi^2 in each interval of each parameter for all data point. If it is doing so, how big is the default interval, and what is the distance between each trial of parameter, and what is the maximum iteration? If it is like what I said, how does this iteration work among the multiple parameters? Are they tried each time as a set of individually?

I want to understand the algorithm so that I can write a good function to apply curve_fit. Many dirty fix are exist in my function that I tried to fit to, different errors appears in different small changes so that I can't put my code here because I don't know the problem.

Also, about the sigma input, if I don't have a clue of the y-error, what happen to the result if the default sigma is too big compare to the data value? Or, if the flexibility of the model doesn't affect much on the post process or the sigma is far less than the distance of the data to the function, what would happen?

Did you read the curve_fit docs? specifically under method? — kabanus
I did missed that. I have read the LM method though. I will see if I can figure it out. Thank you. — Zhidong Li
Follow the link to least_squares. The Levenberg-Marquardt algorithm is common to use, but there are other options. — kabanus
The default non-linear curve fitting algotihm that scipy uses is the LM. — High Performance Rangsiman

M Newville M Newville · Accepted Answer · 2018-04-12T11:25:58

To start understanding how scipy curve_fit tries to solve a problem, start with reading about non-linear optimization methods such as the Levenberg-Marquardt (LM) method (see https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm). This is the method curve_fit uses when bounds on parameter values are not specified, and it will give most of the answers you seek.

This method (and the other solvers in scipy.optimize) start with initial values and try to refine these values in order to find the minimal resulting residual array (in the least-squares sense: to be clear, the method expects a residual array and does the squaring and summing itself). To find which direction to move parameter values, it uses the first derivative (of the residual wrt to the variables that is) or Jacobian. It it very common to not have an analytic form for this, in which case numerical derivatives are calculated by taking very small steps (ofteen around machine precision) in parameter values.

With these derivatives and the last step size the LM method decides how big of a step to make to find the minimal residual. It is a common (and often a good) approximation to think of the residual as a quadratic function of each parameter. If the value is far from the bottom, taking a step that is linear in the derivative (ie, just following the local slope) is pretty good. But near the end solution, it is helpful to use the 2nd derivative (aka Hessian, curvature, or in some sense "history"). LM combines these two approaches to try to quickly find the solution. At each step, it first calculates the derivatives (and second derivative after the first step) then calculates the step to take for all parameters. It repeats this process until the solution (chi-square) does not improve compared to a tolerance. In scipy's leastsq (often used by curve_fit) the step size used to calculate numerical derivatives, and the tolerances for stopping the fit can be specified.

Typically, when using curve_fit or leastsq or least_squares, you do not need to worry about any of these details about how the solution is found. Well, except that if you do have analytic derivatives it can improve the process in speed and accuracy. Your "model function" should just take the parameters passed in and calculate the predicted model for the data. Changing the values passed in will confuse the algorithm -- it's running your function with values it knows in order to try to find the solution.

Other questions:
If you don't know standard errors in your data (y), don't worry about it too much. The absolute value of the report chi-square may not be close to ndata-nvarys (as expected for a good fit), but the value relative to other fits should be fine.

One of the features of the LM method (at least in the MINPACK -> leastsq -> curve_fit implementation) is that it reports the final covariance matrix that can be used to determine uncertainties and correlations in the fitted parameters. Normally the reported values should increase chi-square by 1 (ie, 1-sigma standard errors). Because not having good uncertainties on the data is so common, scipy curve_fit scales this covariance to give values as if chi-square did equal ndata-nvarys. That is, the standard errors in the parameters will be pretty good if you believe the fit is good (and sqrt(chi-square/(ndata-nvays)) can be used to estimate the error in the data).

What is curve_fit's algorithm when fitting data to a post processed non-linear model?

1 Answers