Using fminsearch to perform distribution fitting

Question

Suppose I have a set of univariate data held in the array errors.

I would like to fit a PDF to my observed data distribution.

My PDF is defined in a function poissvmwalkpdf, whose definition line looks like this:

function p = poissvmwalkpdf(theta, mu, kappa, xi)

Here, theta is the error (the variable for which values in errors are instances), and mu, kappa, and xi are parameters of the PDF for which I want to find the best fit using maximum-likelihood estimation. This function returns the probability density at a given value of theta.

Given all this, how would I use fminsearch to find the values for mu, kappa, and xi that best fit my observed errors? The fminsearch documentation doesn't make this clear. None of the examples in the documentation are examples of distribution fitting.

Note: The tutorial here clearly describes what distribution fitting is (as distinguished from curve fitting), but the example given does not use fminsearch.

Trick question: why not use what the tutorial uses for distribution fitting?:) — Andras Deak
There is no unique fit for such data. Choices must be made in how a fit is done and generic fitting methods don't take into account that the data is assumed to be sampled from a known distribution. The most common distribution fitting methods are based on MLE (and minimizing the negative log likelihood). You could use fminsearch and you might get something decent, but it probably won't return the statistically most likely parameters. — horchler
@horchler you can interpret my question as, "how would i use fminsearch to derive an MLE fit between my PDF and my data?" you seem to be arguing that fminsearch cannot do that, but it's a generic minimization function. it can minimize a negative log likelihood. — dbliss

rozsasarpi rozsasarpi · Accepted Answer · 2016-03-12T09:09:51

Here is a minimal example of using fminsearch to obtain maximum likelihood estimates (as requested in the comments):

function mle_fit_minimal

n       = 100;
% for reproducibility
rng(333)
% generate dummy data
errors  = normrnd(0,1,n,1);

par0    = [1, 1];
[par_hat, nll] = fminsearch(@nloglike, par0)

% custom pdf
    function p = my_pdf(data, par)
        mu      = par(1);
        sigma   = par(2);
        p       = normpdf(data, mu, sigma);
    end

% negative loglikelihood function -- note that the parameters must be passed in a 
% single argument (here called par).
    function nll = nloglike(par)
        nll     = -sum(log(my_pdf(errors, par)));
    end
end

After formulating the likelihood function (or negative loglikelihood) it is just a simple optimization.

Using fminsearch to perform distribution fitting

1 Answers