4
votes

I am making a boxplot using geom_boxplot in ggplot2. However, I found the whiskers length is not correct and I don't know why. Here is my data:

value = c(1.3739117,0.8709891,3.4510461,0.8470309,1.4838725,0.6942611,1.3095816,3.0444649,19.2785424,1.0866242,0.9376845,2.2343836, 20.7975509, 20.3102489, 18.0046679,1.4197519)
data = data.frame(value)
ggplot(data, aes(y = value)) +
   stat_boxplot(geom = "errorbar", width = 0.3) +
   geom_boxplot(width = 0.5)

And I see the plot like this:

enter image description here

The 3rd quantile is overlapped with the upper whisker. I did the calculation manually, and the result is as following:

summary(data)
Min.   : 0.6943  
1st Qu.: 1.0494  
Median : 1.4518  
Mean   : 6.0715  
3rd Qu.: 7.0895  
Max.   :20.7976

Based on the explanation of geom_boxplot: The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.

The IQR in my case is: 7.0895-1.0494 = 6.0401

The lower whisker should be: 0.6943 - 1.5*6.0401 = -8.36585

The upper whisker should be: 7.0895 + 1.5*6.0401 = 16.14965

I understand the negative lower whisker is meaningless, so here it is replaced by the min value. But why the upper whisker is not shown? I am so confused and I could not find an example online to solve this problem. Something I misunderstand about ggplot settings? I would really appreciate to your help and suggestions!

1

1 Answers

5
votes

From the quoted section:

The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).

By "value" they mean from among the original data points. If you plot the data, there are no values between the top hinge at 7.09 and 16.15 (+1.5*IQR). If these quartiles had arisen from data with one of the values lying in that range, the upper whisker would go there.

ggplot(data, aes(y = value)) +
  geom_jitter(aes(x = 0.5), width = 0.05) +
  stat_boxplot(geom = "errorbar", width = 0.3, 
               color = "red", size = 1.5) +
  geom_boxplot(width = 0.5, alpha = 0.5) +
  geom_hline(yintercept = c(7.09, 16.15), lty = "dashed")

enter image description here