2
votes

I am trying to fit a trapezoid to a set of time series using the curve_fit library from scipy.optimize. The function that I'm using to generate a trapezoid is the following:

def trapezoid(x, a, b, c, tau1, tau2):
    y = np.zeros(len(x))
    c = -np.abs(c)
    a = np.abs(a)
    y[:int(tau1)] =  a*x[:int(tau1)] + b
    y[int(tau1):int(tau2)] =  a*tau1 + b 
    y[int(tau2):] = c*(x[int(tau2):]-tau2) + (a*tau1 + b)
    return y

Where a and c are the slopes, and tau1 and tau2 mark the beginning and the end of the flat phase.

And in order to fit I just use:

popt, pcov = curve_fit(trapezoid, xdata, ydata, method = 'lm')

For most of the cases it works just fine, such as in the following:
Good fit

However, I'm also getting some cases on which it just fails to fit the data, where it looks like it should be doing ok:

Bad fit

The problem with these cases is that it sets a tau2 (end of the flat phase) smaller than tau1 (beginning of it).

Could anyone suggest a way to solve this issue? Whether by imposing a constraint or in some other way?

Example array for which the fit does not work:

array([1.2 , 1.21, 1.2 , 1.19, 1.21, 1.22, 2.47, 2.53, 2.49, 2.39, 2.28, 2.16, 2.07, 1.99, 1.91, 1.83, 1.74, 1.65, 1.57, 1.5 , 1.45, 1.41, 1.38, 1.35, 1.33, 1.29, 1.24, 1.19, 1.14, 1.11, 1.07, 1.04, 1. , 0.95, 0.91, 0.87, 0.84, 0.8 , 0.77, 0.74, 0.72, 0.7 , 0.68, 0.66, 0.63, 0.61, 0.59, 0.57, 0.55, 0.52, 0.5 , 0.48, 0.45, 0.43, 0.41, 0.39, 0.38, 0.37, 0.37, 0.36, 0.35, 0.34, 0.34, 0.33])

Which yields: tau1: 8.45, tau2:5.99

2
set tau2 to the maximum off tau1 and tau2 and you are done - Glostas
It seems that the optimizer got stuck in a local minimum due to bad initial solution. You did not provide a p0 argument to curve_fit, so your trapezoid parameters are all initialized to 1. You might want to find a way to properly initialize your parameters, otherwise you better use a global optimizer such as SciPy's differential_evolution. - Kefeng91
@Glostas, the problem is that when it fails to fit due to this reason it assings a tau1 and tau2 that are simply wrong, imposing the assignment you mention does not solve the problem. The ideal case would be to be able to add a constraint imposing tau2>tau1, but apparently it is not possible using this library. - yatu
Instead of defining tau2 as the index of the flat phase, it might be better to define it as its length with tau1 still marking its beginning. - Kefeng91
@Mr.T No it is not an assumption, just updated with an example - yatu

2 Answers

1
votes

You might find lmfit (http://lmfit.github.io/lmfit-py/) useful for this problem. Lmfit provides a slightly higher level interface to curve fitting, still based on the scipyoptimizers, but with some better abstractions and features.

In particular for your question, lmfit parameters are Python objects that can have bounds, be fixed, or be written as simple algebraic constraints in terms of other variables. This can support imposing tau2 > tau1. The idea is essentially to set tau2=tau1+taudiff and place a lower bound of 0 on taudiff. While you could rewrite your function to do that in the code, with lmfit you don't have to do that and can put that logic in the Parameters instead.

Converting your script to use lmfit would give something like this:

from lmfit import Model

# use your same model function
def trapezoid(x, a, b, c, tau1, tau2):
    y = np.zeros(len(x))
    c = -np.abs(c)
    a = np.abs(a)
    y[:int(tau1)] =  a*x[:int(tau1)] + b
    y[int(tau1):int(tau2)] =  a*tau1 + b 
    y[int(tau2):] = c*(x[int(tau2):]-tau2) + (a*tau1 + b)
    return y

# turn model function into lmfit Model
tmod = Model(trapezoid)

# create Parameters for this model: they will be *named* according
# to the signature of the model function, and be used as keys in
# an ordered-directory-derived object.  Here you can also give
# initial values
params = tmod.make_params(a=1, b=2, c=0.5, tau1=5, tau2=-1)

# now you can set bounds or constraints.  

# 1st, add a new variable "taudiff"
params.add('taudiff', value=0.1, min=0, vary=True)

# constraint tau2 to be taudiff+tau1 -- this is no longer a "free variable:
params['tau2'].expr = "taudiff + tau1"

# now do fit to data:
result = tmod.fit(ydata, params, x=xdata)

# print report of fit
print(result.fit_report())

# get best fit params:
for parname, param in result.params:
    print(parname, param.value, param.stderr, param.expr)

# get best fit array for plotting
pylab.plot(xdata, ydata)
pylab.plot(xdata, result.best_fit)

Hope that helps.

1
votes

enter image description hereJust setting t1,t2 to the minimum and maximum value does work

def trapezoid(x, a, b, c, tau1, tau2):
    y = np.zeros(len(x))
    c = -np.abs(c)
    a = np.abs(a)
    (tau1,tau2) = (min(tau1,tau2),max(tau1,tau2))

    y[:int(tau1)] =  a*x[:int(tau1)] + b
    y[int(tau1):int(tau2)] =  a*tau1 + b 
    y[int(tau2):] = c*(x[int(tau2):]-tau2) + (a*tau1 + b)



x_data = np.arange(len(A))
popt, pcov = curve_fit(trapezoid, x_data, A, method = 'lm')
print popt
fit = trapezoid(x_data,*popt)

leads to: