I have lognormal distributed data in x0 and y0 arrays:
x0.ravel() = array([19.8815 , 19.0141 , 18.1857 , 17.3943 , 16.6382 , 15.9158 ,
15.2254 , 14.5657 , 13.9352 , 13.3325 , 12.7564 , 12.2056 ,
11.679 , 11.1755 , 10.6941 , 10.2338 , 9.79353, 9.37249,
8.96979, 8.58462, 8.21619, 7.86376, 7.52662, 7.20409,
6.89552, 6.6003 , 6.31784, 6.04757, 5.78897, 5.54151,
5.30472, 5.07812, 4.86127, 4.65375, 4.45514, 4.26506,
4.08314, 3.90903, 3.74238, 3.58288, 3.4302 , 3.28407,
3.14419, 3.01029, 2.88212, 2.75943, 2.64198, 2.52955,
2.42192, 2.31889, 2.22026, 2.12583, 2.03543, 1.94889,
1.86604, 1.78671, 1.71077, 1.63807, 1.56845, 1.50181,
1.43801, 1.37691, 1.31842, 1.26242, 1.2088 , 1.15746,
1.10832, 1.06126, 1.01619])
y0.ravel() =array([1.01567e+03, 8.18397e+02, 7.31992e+02, 1.11397e+03, 2.39987e+03,
2.73762e+03, 4.65722e+03, 7.06308e+03, 9.67945e+03, 1.38983e+04,
1.98178e+04, 1.97461e+04, 3.28070e+04, 4.48814e+04, 5.80853e+04,
7.35511e+04, 8.94090e+04, 1.08274e+05, 1.28276e+05, 1.50281e+05,
1.69258e+05, 1.91944e+05, 2.16416e+05, 2.37259e+05, 2.57426e+05,
2.74818e+05, 2.90343e+05, 3.01369e+05, 3.09232e+05, 3.13713e+05,
3.17225e+05, 3.19177e+05, 3.17471e+05, 3.14415e+05, 3.08396e+05,
2.95692e+05, 2.76097e+05, 2.52075e+05, 2.29330e+05, 1.97843e+05,
1.74262e+05, 1.46360e+05, 1.20599e+05, 9.82223e+04, 7.80995e+04,
6.34618e+04, 4.77460e+04, 3.88737e+04, 3.23715e+04, 2.58129e+04,
2.15724e+04, 1.58737e+04, 1.13006e+04, 7.64983e+03, 4.64590e+03,
3.31463e+03, 2.40929e+03, 3.02183e+03, 1.47422e+03, 1.06046e+03,
1.34875e+03, 8.26674e+02, 9.53167e+02, 6.47428e+02, 9.83651e+02,
8.93673e+02, 1.23637e+03, 0.00000e+00, 8.36573e+01])
I want to use curve_fit to get an function, that fits my data points, to gain the mu (and then the exp(mu) for the median) and the sigma of this distribution.
import numpy as np
from scipy.optimize import *
def f(x, mu, sigma) :
return 1/(np.sqrt(2*np.pi)*sigma*x)*np.exp(-((np.log(x)-
mu)**2)/(2*sigma**2))
params, extras = curve_fit(f, x0.ravel(), y0.ravel())
print "mu=%g, sigma=%g" % (params[0], params[1])
plt.plot(x0, y0, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
The result is the following:
mu=1.47897, sigma=0.0315236
Obviously the function does not fit the data by any means.
When i multiply the fitting function by, lets say 1.3*10^(5) in the code:
plt.plot(x0, 1.3*10**5*f(x0 ,params[0], params[1]))
This is the result:
manually changed fitting curve
The calculated mu value, which is the mean value of the related normal distribution seems right, because when im using:
np.mean(np.log(x))
i get 1.4968838412183132, which is quite similar to the mu i obtain from curve_fit.
Calculating the median with
np.exp(np.mean(np.log(x))
gives 4.4677451525990675, which seems to be ok.
But unless i see the fitting function going threw my datapoints, i do not really trust these numbers. My problem is obviously, that the fitting function has no information of the (big) y0 values. How can i change that? Any help apreciated!