Least squares function and 4 parameter logistics function not working

Question

Relatively new to python, mainly using it for plotting things. I am currently attempting to determine a best fit line using the 4 parameter logistic (4PL) equation and curve fit from scipy. There are one or two sites showing how 4PL works, but could not get them to work for my data. Example, but similar 4PL data below:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import scipy.optimize as optimization

xdata = [2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]
ydata = [0.32, 0.3, 0.55, 0.60, 0.88, 0.92, 1.27, 1.21, 1.15, 1.12, 1.1, 1.1]
def fourPL(x, A, B, C, D):
    return ((A-D)/(1.0+((x/C)**(B))) + D)
guess = [0, -0.5, 0.5, 1]
params, params_covariance = optimization.curve_fit(fourPL, xdata, ydata, 
guess)
params

Gives warning (also an exponent warning in test data, but not real):

OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)

And the params returns my initial guess. I have tried various initial guesses.

The best fit line is drawn when plotting, but is not a curve and does not go below x = 0 (I cannot find a reason negatives would mess with the 4PL model). 4PL fit plotted

I'm not sure if I am doing something incorrect with the equation, or how the curve fit function works, or both. I have a similar issue using least squares instead of curve fit. I've tried a bunch of variations based off similar equations for fit etc. but have been stuck for awhile, any help in pointing me in the right direction would be much appreciated.

Hi and welcome! Please provide a minimal, complete, and verifyable example (i.e. complete code with toy data) to increase the chance of getting a useful answer. People want to simply copy&paste your code to reproduce the error and fiddle with possible solutions. Also have a look here. This answer I posted yesterday may give you a few ideas, although the logistic function should be well amenable for curve_fit. — MB-F
I see from the plot that you have several data points for each X value on the plot. It might simplify your initial work to average those values, giving you a smaller data set to work with as you begin. Once you have a good working solution using the smaller data set, then switch back to using the larger full data set. This might be useful as a quick and easy test that is simple to perform. — James Phillips
Thanks for the suggestions, edited to have a simple, copiable example. Also played around with the example data, and am seeming to get stuck in the same places. as before. — wp395

sascha sascha · Accepted Answer · 2017-10-12T19:06:11

I'm surprised you did not get any warnings or did not share them with us. I can't analyze this task for you by scientific means, just some remarks about technical stuff:

Observation

When running your code, you should some warnings like:

RuntimeWarning: invalid value encountered in power
  return ((A-D)/(1.0+((x/C)**(B))) + D)

Don't ignore this!

Debugging

Add some prints to your function fourPL, probably all the different components of your function and look what's happening.

Example:

def fourPL(x, A, B, C, D):
    print('1: ', (A-D))
    print('2: ', (x/C))
    print('3: ', (1.0+((x/C)**(B))))

    return ((A-D)/(1.0+((x/C)**(B))) + D)

...
params, params_covariance = optimization.curve_fit(fourPL, xdata, ydata, guess, maxfev=1)
# maxfev=1 -> let's just check 1 or few it's

Output:

1:  -1.0
2:  [  4.60000000e+00   4.60000000e+00   4.00000000e+00   4.00000000e+00
   3.40000000e+00   3.40000000e+00   2.00000000e+00   2.00000000e+00
   2.00000000e-06   2.00000000e-06  -2.00000000e+00  -2.00000000e+00]
RuntimeWarning: invalid value encountered in power
  print('3: ', (1.0+((x/C)**(B))))
3:  [   1.4662524     1.4662524     1.5           1.5           1.54232614
    1.54232614    1.70710678    1.70710678  708.10678119  708.10678119
           nan           nan]

That's enough to stop. nans and infs are bad!

Theory

Now it's time for theory and i won't do that. But usually you now should think about the underlying theory and why these problems occur.

Is there something you missed in regards to the assumptions?

Repair (without checking theory)

Without checking out the theory and just looking over some example found within 30 secs: hmm are negative x-values a problem?

Let's shift x (by the minimum; hardcoded 1 here):

xdata = np.array([2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]) + 1

Complete code:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import scipy.optimize as optimization

xdata = np.array([2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]) + 1
ydata = np.array([0.32, 0.3, 0.55, 0.60, 0.88, 0.92, 1.27, 1.21, 1.15, 1.12, 1.1, 1.1])

def fourPL(x, A, B, C, D):
    return ((A-D)/(1.0+((x/C)**(B))) + D)

guess = [0, -0.5, 0.5, 1]
params, params_covariance = optimization.curve_fit(fourPL, xdata, ydata, guess)#, maxfev=1)

x_min, x_max = np.amin(xdata), np.amax(xdata)
xs = np.linspace(x_min, x_max, 1000)
plt.scatter(xdata, ydata)
plt.plot(xs, fourPL(xs, *params))
plt.show()

Output:

RuntimeWarning: divide by zero encountered in power
  return ((A-D)/(1.0+((x/C)**(B))) + D)

Looks good, but it's time for another theory session: what did our linear-shift do to our results? I'm ignoring this again.

So just one warning and a nice-looking output.

If you want to remove that last warning, add some small epsilon to not have 0's in xdata:

xdata = np.array([2.3, 2.3, 2, 2, 1.7, 1.7, 1, 1, 0.000001, 0.000001, -1, -1]) + 1 + 1e-10

which will achieve the same, without any warning.