0
votes

I have some data in Matlab I'd like to plot using a boxplot. Since the standard boxplot wasn't customizable enough, I decided to use the boxPlot function from the IoSR Statistics Toolbox. I want the boxplot to have whiskers that show the 0.025- and 0.975-quantiles, while the box itself shows interquartile range and median (this is by default anyway). A typical data set is for example

p = [60.1 93 135.2 69 107.1 98.4 118.9 83.9 67 74.5 102.5 120.8 103.7 114.3 102.4 139.9 110.4 119.3 105.1 79.8 222.7 185.3 76.4 100.2 61.2 131.6 87.2 96 113.3 52.9 78.5 163.3 65.4 64.4];

I calculated the 0.025- and 0.975-quantiles with the standard Matlab method prctile() as well as the quantile function of the toolbox quantile() that the boxplot method should be using and get the exact same results:

prctile(p,2.5,2) -> ans = 55.4200
prctile(p,97.5,2) -> ans = 209.6100
iosr.statistics.quantile(p,0.025,2,'R-5') -> ans = 55.4200
iosr.statistics.quantile(p,0.975,2,'R-5') -> ans = 209.6100

However when I create a boxplot with the code below, I get whiskers that are somehow too short. The upper whisker ends at 185.3, not 209.61, and the lower whisker extends to 60.1, instead of 55.42. The whiskers now end exactly at the second largest / smallest value of the data set, while the manually computed quantiles seem to utilize some sort of interpolation scheme to get to their values.

What is it I'm getting wrong here? How do I get the boxplot to show the same quantiles as the manually calculated?

iosr.statistics.boxPlot(["data"],p'...
    ,'limit',[2.5, 97.5],'boxColor',[0.5 0.5 0.5],'lineColor','k'...
    ,'medianColor',[0.2 0.2 0.2],'method','R-5','showOutliers',false...
    ,'xspacing','equal')

enter image description here

1

1 Answers

1
votes

Indeed the prctile function uses interpolation in calculating the exact percentile values.

Meanwhile, according to the source code for the IoSR boxplot function:

the whiskers extend to the most extreme data that are not considered outliers

So the options 'limit' isn't setting where the whiskers end directly, just where to mark outliers.

I went and confirmed it with your example data. Here the red lines are your desired quantile values, and the blue dots are your data. I've marked the furthest data points, and sure enough, that's where your whiskers ended.


As to how to get it to show the desired percentile marker, I don't know for sure, since I don't have the toolbox, but I recommend you look into their options for adding percentile; see help or docs? (not sure where the documentations are) Possibly relevant options include:

%       addPrctiles         - Show additional percentiles using markers and
%                             labels. The property should be a vector of
%                             percentiles; each percentile will be plotted
%                             for each box. The property is empty by
%                             default.
...
%       addPrctilesLabels   - Specify labels for the additional
%                             percentiles. The property should be a cell
%                             array of strings. By defualt no label
%                             is shown.
%       addPrctilesMarkers  - Specify markers for the additional
%                             percentiles. The property should be a cell
%                             array of strings indicating the shape of each
%                             percentile; the markers will be repeated for
%                             each box. The default is '*'.

Alternatively, you might be able to manipulate the whiskers directly using the object handles:

%       handles             - Structure containing handles to the various
%                             objects that constitute the plot. The
%                             fields/handles are:
%                                 'axes'            : the parent axes of 
%                                                     the box plot
%                                 'fig'             : the parent figure of
%                                                     the box plot
%                                 'addPrctiles'     : chart line objects
%                                                     for each additional
%                                                     percentile marker
%                                 'addPrctilesTxt'  : text objects for each
%                                                     additional percentile
%                                                     marker
...
%                                 'upperWhiskers'   : line objects for each
%                                                     upper whisker line
%                                 'lowerWhiskers'   : line objects for each
%                                                     lower whisker line
%                                 'upperWhiskerTips': line objects for each
%                                                     upper whisker tip
%                                 'lowerWhiskerTips': line objects for each

By possibly using syntaxes like this:

h = iosr.statistics.boxPlot(["data"],p',...)
h.upperWhiskers = % XY coordinates for desired line
h.lowerWhiskers = % XY coordinates for desired line