4
votes

I'm familiar with the usual Kernel density estimation (KDE) where a single bandwidth value is used to generate a multivariate (usually Gaussian) function for each sample point. The final KDE is then obtained by summing the Gaussian functions for all the sample points.

Say I have N sample points (assume in 1 dimension) each one with an error estimate, e.g.:

sample_points = [0.5, 0.7, 0.3, 1.2, 0.01, 3.6, 0.4]
errors = [0.02, 0.03, 0.05, 0.01, 0.03, 0.01, 0.07]

What I'm after, is a way to obtain the KDE for this sample using the error associated to each point as the bandwidth for its function.

I could eventually do this by manually obtaining the Gaussian kernel for each point separately, and then combining all the functions (no easy task).

Is there an already implemented function that does this? I've looked around but the kernel density estimator functions I found (scipy.stats.gaussian_kde, sklearn.neighbors.KernelDensity) use a fixed bandwidth value for all the points.

1

1 Answers

3
votes

I recently asked a similar question. There are (as far I was able to find) no implementations of that. Here is what I used (which works for my needs):

import numpy as np

def solve_gaussian(val,data_array,sigma_array):
    return (1. / sigma_array) * np.exp(- (val - data_array) * (val - data_array) / (2 * sigma_array * sigma_array))

def solve_kde(xlist,data_array,sigma_array):
    kde_array = np.array([])
    for xx in xlist:
        single_kde = solve_gaussian(xx,data_array,sigma_array)
        if np.ndim(kde_array) == 3:
            kde_array = np.concatenate((kde_array,single_kde[np.newaxis,:,:]),axis=0)
        else:
            kde_array = np.dstack(single_kde)
    return kde_array

xlist = np.linspace(0,1,101) #Adjust as needed
kde_array = solve_kde(xlist,data_array,sigma_array)
kde_vector = np.sum(np.sum(kde_array,axis=2),axis=1)
mode_guess = xlist[np.argmax(kde_vector)]

It's a reimplementation of the scipy gaussian kde.