I was curious at how pandas dataframe handles calculating the upper and lower whiskers, with outliers. Normally it's 1.5IQR-Q1, 1.5IQR+Q3
. However, the problem I can't understand, or maybe I'm wrong on how the whiskers are calculated. It shows the same problems in the boxplot section of https://pandas.pydata.org/pandas-docs/stable/visualization.html
Here's a sample of code I've randomly selected:
ray1=[0.217766,0.691315,0.289239,0.239135,0.161341,0.364297,0.373284,0.323216]
df = pd.DataFrame(ray1, dtype = float)
If I used the df.describe()
it gives me the stats of that array.
count 8.000000
mean 0.332449
std 0.162374
min 0.161341
25% 0.233793
50% 0.306227
75% 0.366544
max 0.691315
But according to the upper whisker, lower whisker from the normal 1.5IQR-Q1, 1.5IQR+Q3
, it should be around .565
and .035
. If I plot this with df.boxplot()
it shows the upper whisker as 0.373
and the lower whisker as .161
. I've tried other variations (2.698σ
) and the medcouple and those don't equal either.
So how is it getting those values, when outliers are present?
0.691315
value is an outlier. Do you consider that part of your distribution? – Usernamenotfound