1
votes

I used the regular "boxplot" function to check my data for extreme values. For presentations within the project, I then created the same boxplots using the package ggpubr (which builds on ggplot2).

As far I understood the whiskers should represent the same area in both blots. To my surprise, an extreme value appears in the ggpubr boxplot, which is not shown in the r-boxplot.

Screenshot ggpubr boxplot

Screenshot r-boxplot

Code R-Boxplot:

boxplot(data$vtrust_post1, data$vtrust_post2)

Code Ggpubr:

vtrust_a_b_long %>% 
ggboxplot(x = "drive", y = "count", bxp.errorbar = TRUE)

For ggpubr I had to convert the data to long format:

vtrust_a_b <- data.frame("subject_no" = data$id, "a" = data$vtrust_post1, "b" = data$vtrust_post2)

vtrust_a_b_long <- vtrust_a_b %>%
  gather(key = "drive", value = "count", a, b)

Did I do something wrong? Both data contain the same maximum value. Could it be that the extreme values are defined differently in r-boxplot and ggpubr/ggplot2?

I am very grateful for your help!

UPDATE: Code to reproduce the problem

Using this code it indicates an extremevalue only for the ggpubr version.

a <- c(1.50, 3.50, 1.50, 3.00, 1.25, 5.25, 2.50, 2.50, 1.50, 2.25, 1.75, 2.25, 2.25, 2.25, 4.50, 2.25, 3.25, 1.25, 2.50, 2.75, 1.75, 4.25, 2.75, 2.00,
 1.75, 3.50, 3.25, 3.00, 1.25, 1.25, 3.75, 1.50, 1.75, 2.25, 1.25, 2.00, 1.50, 3.50, 1.75, 3.25)

boxplot(a)
ggpubr::ggboxplot(a)

Update 2 Because of the hint, it could be caused by local settings within R, I tried the code at https://rdrr.io/snippets/. Again, the discrepancy occurs:

Screenshot comparison r-base boxplot and ggpubr boxplot.

1
Base R's boxplot should catch outliers at 1.5 x IQR but it looks like it isn't. You could try forcing it by setting range=1.5 as a boxplot argument to see if it makes a difference. If its range is set at zero, the whisker will extend to the data extreme. It's hard to check without your data.pdw
suggest start R in a new session. There is possibly some global setting that you have changed without knowing. See the code in the answer for demonstration of lack of reproducibility.tjebo
You should probably include the data so people can reproduce your problem. The code won't work without the data.quickreaction
Thank you for your help! I tried both, but it changed nothing. I added a new code to reproduce the problem to the original post.Niko

1 Answers

2
votes

Reviewing the source code for base R graphics::boxplot, line 48 shows that the whiskers are calculated by grDevices::boxplot.stats. Reviewing the source code of boxplot.stats (by typing it at the prompt) reveals it uses stats::fivenum to calculate the distance to plot the whiskers.

fivenum(a)
#[1] 1.250 1.625 2.250 3.125 5.250

Lines 156 and 157 of the graphics::boxplot source shows the whiskers extend from the 2nd value to the 1st value and then from the 4th value to the 5th value.

In contrast, reviewing the source for geom_boxplot shows us that the whiskers extend to the nearest data point that is no more than 1.5 times the IQR from the hinges. This is calculated by stats::quantile:

quantile(a,0.75) + diff(quantile(a,c(0, 0.25, 0.5, 0.75, 1))[c(2,4)])*1.5
#5.125 

Since element 6 of a is 5.25, it is more than 5.125, and therefore, the whisker does not extend to that point.