2
votes

I have lognormal distributed data in x0 and y0 arrays:

x0.ravel() = array([19.8815 , 19.0141 , 18.1857 , 17.3943 , 16.6382 , 15.9158 ,
   15.2254 , 14.5657 , 13.9352 , 13.3325 , 12.7564 , 12.2056 ,
   11.679  , 11.1755 , 10.6941 , 10.2338 ,  9.79353,  9.37249,
    8.96979,  8.58462,  8.21619,  7.86376,  7.52662,  7.20409,
    6.89552,  6.6003 ,  6.31784,  6.04757,  5.78897,  5.54151,
    5.30472,  5.07812,  4.86127,  4.65375,  4.45514,  4.26506,
    4.08314,  3.90903,  3.74238,  3.58288,  3.4302 ,  3.28407,
    3.14419,  3.01029,  2.88212,  2.75943,  2.64198,  2.52955,
    2.42192,  2.31889,  2.22026,  2.12583,  2.03543,  1.94889,
    1.86604,  1.78671,  1.71077,  1.63807,  1.56845,  1.50181,
    1.43801,  1.37691,  1.31842,  1.26242,  1.2088 ,  1.15746,
    1.10832,  1.06126,  1.01619])

y0.ravel() =array([1.01567e+03, 8.18397e+02, 7.31992e+02, 1.11397e+03, 2.39987e+03,
   2.73762e+03, 4.65722e+03, 7.06308e+03, 9.67945e+03, 1.38983e+04,
   1.98178e+04, 1.97461e+04, 3.28070e+04, 4.48814e+04, 5.80853e+04,
   7.35511e+04, 8.94090e+04, 1.08274e+05, 1.28276e+05, 1.50281e+05,
   1.69258e+05, 1.91944e+05, 2.16416e+05, 2.37259e+05, 2.57426e+05,
   2.74818e+05, 2.90343e+05, 3.01369e+05, 3.09232e+05, 3.13713e+05,
   3.17225e+05, 3.19177e+05, 3.17471e+05, 3.14415e+05, 3.08396e+05,
   2.95692e+05, 2.76097e+05, 2.52075e+05, 2.29330e+05, 1.97843e+05,
   1.74262e+05, 1.46360e+05, 1.20599e+05, 9.82223e+04, 7.80995e+04,
   6.34618e+04, 4.77460e+04, 3.88737e+04, 3.23715e+04, 2.58129e+04,
   2.15724e+04, 1.58737e+04, 1.13006e+04, 7.64983e+03, 4.64590e+03,
   3.31463e+03, 2.40929e+03, 3.02183e+03, 1.47422e+03, 1.06046e+03,
   1.34875e+03, 8.26674e+02, 9.53167e+02, 6.47428e+02, 9.83651e+02,
   8.93673e+02, 1.23637e+03, 0.00000e+00, 8.36573e+01])

I want to use curve_fit to get an function, that fits my data points, to gain the mu (and then the exp(mu) for the median) and the sigma of this distribution.

import numpy as np
from scipy.optimize import *

def f(x, mu, sigma) :
   return 1/(np.sqrt(2*np.pi)*sigma*x)*np.exp(-((np.log(x)- 
   mu)**2)/(2*sigma**2))


params, extras = curve_fit(f, x0.ravel(), y0.ravel())

print "mu=%g, sigma=%g" % (params[0], params[1])

plt.plot(x0, y0, "o")
plt.plot(x0, f(x0 ,params[0], params[1])) 
plt.legend(["data", "fit"], loc="best")
plt.show()

The result is the following:

mu=1.47897, sigma=0.0315236

Curve_fit

Obviously the function does not fit the data by any means.

When i multiply the fitting function by, lets say 1.3*10^(5) in the code:

plt.plot(x0, 1.3*10**5*f(x0 ,params[0], params[1]))

This is the result:

manually changed fitting curve

The calculated mu value, which is the mean value of the related normal distribution seems right, because when im using:

 np.mean(np.log(x))

i get 1.4968838412183132, which is quite similar to the mu i obtain from curve_fit.

Calculating the median with

np.exp(np.mean(np.log(x))

gives 4.4677451525990675, which seems to be ok.

But unless i see the fitting function going threw my datapoints, i do not really trust these numbers. My problem is obviously, that the fitting function has no information of the (big) y0 values. How can i change that? Any help apreciated!

1
I don't understand your data. You say you have lognormal distributed data in x0 and y0, than you want to fit these data to a lognormal pdf. I think you might have mixed something up.datasailor
Yes, i'm definitly mixing something up. When i plot x0 versus y0, the graph (as shown in pictures above) clearly shows an lognormal behaviour. Now i want to fit a function through this points, to gain the mu and the sigma.Carolyn

1 Answers

5
votes

The problem is, that your data are not(!) showing a lognormal pdf, since they are not normalized properly. Note, that the integral over a pdf has to be 1. If you numerically integrate your data and nomalize by that, e.g.

y1 = y0/np.trapz(x0, y0)

your approach works fine.

params, extras = curve_fit(f, x0, y1)

plt.plot(x0, y1, "o")
plt.plot(x0, f(x0 ,params[0], params[1])) 
plt.legend(["data", "fit"], loc="best")
plt.show()

enter image description here

and

print("mu=%g, sigma=%g" % (params[0], params[1]))

resulting in

mu=1.80045, sigma=0.372185