1
votes

I was curious at how pandas dataframe handles calculating the upper and lower whiskers, with outliers. Normally it's 1.5IQR-Q1, 1.5IQR+Q3. However, the problem I can't understand, or maybe I'm wrong on how the whiskers are calculated. It shows the same problems in the boxplot section of https://pandas.pydata.org/pandas-docs/stable/visualization.html Here's a sample of code I've randomly selected:

ray1=[0.217766,0.691315,0.289239,0.239135,0.161341,0.364297,0.373284,0.323216]
df = pd.DataFrame(ray1, dtype = float)

If I used the df.describe() it gives me the stats of that array.

count  8.000000
mean   0.332449
std    0.162374
min    0.161341
25%    0.233793
50%    0.306227
75%    0.366544
max    0.691315

But according to the upper whisker, lower whisker from the normal 1.5IQR-Q1, 1.5IQR+Q3, it should be around .565 and .035. If I plot this with df.boxplot() it shows the upper whisker as 0.373 and the lower whisker as .161. I've tried other variations (2.698σ) and the medcouple and those don't equal either.

So how is it getting those values, when outliers are present?

1
The 0.691315 value is an outlier. Do you consider that part of your distribution?Usernamenotfound
I guess I was assuming it was kept since it is an actual point. So are you saying it disregards the outliers when it calculates the whiskers?Chase Calkins
yep. That's what it doesUsernamenotfound
Is there a way to force it to stay in there, or would I have to do everything manually then?Chase Calkins

1 Answers

0
votes

The whiskers calculated for the plot fall on values within your data. As your data only has 8 values it can be easily visualised where the position of the whiskers is coming from.

The following code produces your boxplot and also overlies the data points on this.

df.boxplot()
plt.plot([1]*len(df),df[0],'x')
plt.show()

The plot produced:

boxplot

Hopefully it is clear enough to see that the upper whisker falls on a datapoint.

From the documentation: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot.html: "the upper whisker will extend to last datum less than Q3 + whis*IQR"