0
votes

As we know, quantile function is the inverse cumulative distribution function.

Then for an existed distribute(a vector), how to exactly match the result of cumulative distribution function and quantile function?

Here is an example given in MATLAB.

a = [150   154   151   153   124]
[x_count, x_val] = hist(a, unique(a));
% compute the probability cumulative distribution 
p = cumsum(n)/sum(n);
x_out = quantile(a, p)

In the cumulative distribution function, the corresponding relation between cumulative probability and x value should be:

x = 124   150   151   153   154
p = 0.2000    0.4000    0.6000    0.8000    1.0000

But use p and quantile to compute x_out, the result is different with x:

x_out =

  137.0000  150.5000  152.0000  153.5000  154.0000

Reference

  1. quantile function
  2. matlab quantile function
1
The documentation for the R function is much better: stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html - Will Cornwell
Please add an explanation of how you got to your desired output (i.e. why you think your Matlab code is wrong). - Dan
@Dan, I expect the two functions have exactly mapped x-y pairs. - ouxiaogu
@ouxiaogu but why do you expect that? You have not explicitly defined your algorithm for finding quantiles. Matlab has defined theirs and even uses your exact use case as the example in the docs. See my answer for more. - Dan
From math perspective it is undefined, because the function you are inverting has a step in it. - Daniel

1 Answers

1
votes

From the docs:

For a data vector of five elements such as {6, 3, 2, 10, 1}, the sorted elements {1, 2, 3, 6, 10} respectively correspond to the 0.1, 0.3, 0.5, 0.7, 0.9 quantiles.

So if you wanted to get the exact numbers out that you put in for x, and your x has 5 elements then your p needs to be p = [0.1, 0.3, 0.5, 0.7, 0.9]. The complete algorithm is explicitly defined in the documentation.

You have assumed that to get x back, p should have been [0.2, 0.4, 0.6, 0.8, 1]. But then why not p = [0, 0.2, 0.4, 0.6, 0.8]? Matlab's algorithm seems to just take a linear average of the two methods.

Note that R defines nine different algorithms for quantiles, so your assumptions need to be stated clearly.