10
votes

I've looked online and have yet to find an answer or way to figure the following

I'm translating some MATLAB code to Python where in MATLAB im looking to find the kernel density estimation with the function:

[p,x] = ksdensity(data)

where p is the probability at point x in the distribution.

Scipy has a function but only returns p.

Is there a way to find the probability at values of x?

Thanks!

3

3 Answers

7
votes

That form of the ksdensity call automatically generates an arbitrary x. scipy.stats.gaussian_kde() returns a callable function that can be evaluated with any x of your choosing. The equivalent x would be np.linspace(data.min(), data.max(), 100).

import numpy as np
from scipy import stats

data = ...
kde = stats.gaussian_kde(data)
x = np.linspace(data.min(), data.max(), 100)
p = kde(x)
5
votes

Another option is the kernel density estimator in the Scikit-Learn Python package, sklearn.neighbors.KernelDensity

Here is a little example similar to the Matlab documentation for ksdensity for a Gaussian distribution:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KernelDensity

np.random.seed(12345)
# similar to MATLAB ksdensity example x = [randn(30,1); 5+randn(30,1)];
Vecvalues=np.concatenate((np.random.normal(0,1,30), np.random.normal(5,1,30)))[:,None]
Vecpoints=np.linspace(-8,12,100)[:,None]
kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(Vecvalues)
logkde = kde.score_samples(Vecpoints)
plt.plot(Vecpoints,np.exp(logkde))
plt.show()

The plot this produces looks like:

enter image description here

0
votes

Matlab is orders of magnitude faster than KernelDensity when it comes to finding the optimal bandwidth. Any idea of how to make the KernelDenisty faster? – Yuca Jul 16 '18 at 20:58

Hi, Yuca. The matlab use Scott rule to estimate the bandwidth, which is fast but requires the data from the normal distribution. For more information, please see this Post.