numpy - Fitting data using UnivariateSpline in scipy python

Question

I have a experimental data to which I am trying to fit a curve using UnivariateSpline function in scipy. The data looks like:

 x         y
13    2.404070
12    1.588134
11    1.760112
10    1.771360
09    1.860087
08    1.955789
07    1.910408
06    1.655911
05    1.778952
04    2.624719
03    1.698099
02    3.022607
01    3.303135

Here is what I am doing:

import matplotlib.pyplot as plt
from scipy import interpolate
yinterp = interpolate.UnivariateSpline(x, y, s = 5e8)(x) 
plt.plot(x, y, 'bo', label = 'Original')
plt.plot(x, yinterp, 'r', label = 'Interpolated')
plt.show()

That's how it looks:

Curve fit

I was wondering if anyone has thought on other curve fitting options which scipy might have? I am relatively new to scipy.

Thanks!

Do you have apriori knowledge about data you are working with? May be theoretical representation? Or can you get more data? 50 or 100 points? — twil
@twil: Nope. The data is from experiment involving human decisions. That's all i know. I am trying to fit a curve with aim to extrapolate to further values of x. I tried cubic spline and polyfit, but they are also not good. Am I doing something wrong with the choice of smoothing function above in UnivariateSpline? — Prakhar Mehrotra
You are doing ok but have to little data. I'd say values at 3 and 13 are somewhat not "normal". If you remove them you'll get a... better curve? But without any knowledge or assumption about process it's not fair:) — twil

Craig J Copi Craig J Copi · Accepted Answer · 2013-07-28T23:38:07

There are a few issues.

The first issue is the order of the x values. From the documentation for scipy.interpolate.UnivariateSpline we find

x : (N,) array_like
    1-D array of independent input data. MUST BE INCREASING.

Stress added by me. For the data you have given the x is in the reversed order. To debug this it is useful to use a "normal" spline to make sure everything makes sense.

The second issue, and the one more directly relevant for your issue, relates to the s parameter. What does it do? Again from the documentation we find

s : float or None, optional
    Positive smoothing factor used to choose the number of knots.  Number
    of knots will be increased until the smoothing condition is satisfied:

    sum((w[i]*(y[i]-s(x[i])))**2,axis=0) <= s

    If None (default), s=len(w) which should be a good value if 1/w[i] is
    an estimate of the standard deviation of y[i].  If 0, spline will
    interpolate through all data points.

So s determines how close the interpolated curve must come to the data points, in the least squares sense. If we set the value very large then the spline does not need to come near the data points.

As a complete example consider the following

import scipy.interpolate as inter
import numpy as np
import pylab as plt

x = np.array([13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
y = np.array([2.404070, 1.588134, 1.760112, 1.771360, 1.860087,
          1.955789, 1.910408, 1.655911, 1.778952, 2.624719,
          1.698099, 3.022607, 3.303135])
xx = np.arange(1,13.01,0.1)
s1 = inter.InterpolatedUnivariateSpline (x, y)
s1rev = inter.InterpolatedUnivariateSpline (x[::-1], y[::-1])
# Use a smallish value for s
s2 = inter.UnivariateSpline (x[::-1], y[::-1], s=0.1)
s2crazy = inter.UnivariateSpline (x[::-1], y[::-1], s=5e8)
plt.plot (x, y, 'bo', label='Data')
plt.plot (xx, s1(xx), 'k-', label='Spline, wrong order')
plt.plot (xx, s1rev(xx), 'k--', label='Spline, correct order')
plt.plot (xx, s2(xx), 'r-', label='Spline, fit')
# Uncomment to get the poor fit.
#plt.plot (xx, s2crazy(xx), 'r--', label='Spline, fit, s=5e8')
plt.minorticks_on()
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Result from example code

numpy - Fitting data using UnivariateSpline in scipy python

1 Answers