Fit simulated and experimental data points with Python

Question

I have written some code which performs a Monte Carlo simulation and produces curves of signal intensity versus time. The shape of such a curve depends on various parameters, two of which my collaborator wants to determine by the 'real life version' of the experiment I am simulating.

We are ready to compare her experimental data with my simulated curves, but now I am stuck, as I have not yet been able to perform any fit (so far I have replaced the experimental data with simulated noisy data for testing). I have tried using scipy.optimize.leastsq, which exits with code 2 (according to the documentation this means that the fitting was successful), but it mostly simply returns the values (not exactly the same, but close) I input as an initial guess, no matter how close to or far off they were from the true values. If it does report different values, the resulting curve still differs significantly from the true one.

Another observation is that infodict['nfev'] invariably contains

The relative error between two consecutive iterates is at most 0.000000

While using my simulated noisy data I have played around with both parameters' true values being of the same order of magnitude (as I thought that the step size in use might only sensibly affect one of them otherwise), of a very different order of magnitude, and I have varied the step size (parameter epsfcn), but to no avail.

Does anyone have any idea what I might be doing wrong, or what fitting function I could use instead of leastsq? If so: thanks a lot in advance!

EDIT

As suggested by Russ I will now provide some detail on how the simulation is performed: we're looking at small molecules binding to large molecules. This happens with a probability which depends on their mutual affinity (the affinity is one of the values to be extracted from the experimental data). Once binding has occurred, we also simulate how long it takes until the complex falls apart again (the dissociation time constant is the second parameter we're interested in). There are a number of other parameters, but they only become relevant when the expected signal intensity is calculated, so they are not relevant for the actual simulation.

We start off with a given number of small molecules, the state of each of which is simulated for a number of time steps. At each time step we use the affinity value to determine whether this molecule, in case it's unbound, binds to a large molecule. In case it's bound already we use the dissociation time constant and the amount of time for which it has already been bound to determine whether it dissociates in this step.

In both cases the parameter (affinity, dissociation time constant) is used to calculate a probability, which is then compared to a random number (between 0 and 1), and on this comparison it depends if the state of the small molecule (bound/unbound) changes.

EDIT 2

There is no well-defined function which determines the shape of the resulting curve, and, even though the shape is clearly reproducible, there is an element of randomness to each individual data point. Therefore I have now attempted using optimize.fmin instead of leastsq, but it does not converge and simply exits after the maximum number of iterations have been performed.

EDIT 3

As suggested by Andrea I have uploaded a sample plot. I don't really think providing sample data would help a great deal, it's just one y-value (signal intensity) per x-value (time)...

if the issue is still open, it's help if you could show an example of the data and the fitting procedure you're using. — ev-br
it looks like an interesting application, you should really put some plots or some example data, this way your question would get much more attention. — Andrea Zonca
@Andrea Zonca I have uploaded a plot, as you suggested. I omitted the sample data though, as it's just one x-value per y-value, which probably wouldn't tell anyone very much... Thanks! — canavanin

Russ Russ · Accepted Answer · 2011-09-16T13:23:06

For fitting your data with an arbitrary function you usually want the Levenberg–Marquardt algorithm, and that is what scipy.optimize.leastsq uses, so you are most likely using the correct function.

Have a look at the tutorial in section 5.4 of this page and see if it helps.

It is also possible your underlying function is difficult to fit (what is the function?), but it sounds like you may be having other issues.

Also - as great as StackOverflow is, you'll likely get much more knowledgeable responses for scipy questions by posting directly to the Scipy-User mailing list with some sample code and more detail.

Fit simulated and experimental data points with Python

4 Answers