0
votes

I have a dataset named mpg. I am interested in plotting a boxplot (with points on it) to see the relationship between the variable drv (types of drive train) and the cty (city miles per gallon). Below is my code: ggplot(data=mpg,mapping=aes(x=drv,y=cty))+geom_boxplot(outlier.shape = NA)+geom_jitter()

Is there a way to exclude the outliers from geom_jitter() ?

Plot

2
geom_jitter() does not have argument for discarding the outliers by its own. You have to manually filter the data points to be plotted or manually define which points are outliers before feeding it into the geom_jitter().Nuclear03020704

2 Answers

2
votes

You can hide the outliers for a geom_boxplot with outlier.shape=NA. For geom_jitter, you can use transparency to hide outliers, but these need to be defined first.

mpg %>%
  group_by(drv) %>%
  mutate(cty.show = as.numeric(  # so ggplot doesn't complain about alpha being discrete
    between(cty, 
            quantile(cty)[2] - 1.5*IQR(cty),
            quantile(cty)[4] + 1.5*IQR(cty)))) %>% 
  ggplot(aes(drv, cty)) + 
  geom_boxplot(outlier.shape = NA) + 
  geom_jitter(aes(alpha=cty.show), show.legend=FALSE) +
  scale_alpha_continuous(range = c(0, 1)) # otherwise outliers only partially transparent.

enter image description here

For the second plot, the y-limits could be adjusted if required.

1
votes

geom_jitter() does not have argument for discarding the outliers on its own. You need to manually filter the data points to be plotted by defining which points are outliers.

library(dplyr)
library(ggplot2)

mpg %>%
  group_by(drv) %>%
  mutate(cty_filtered = case_when(cty - quantile(cty)[4] > 1.5*IQR(cty) ~ NA_real_,
                                  quantile(cty)[2] - cty > 1.5*IQR(cty) ~ NA_real_,
                                  TRUE ~ cty)) %>%
  ggplot() + geom_boxplot(aes(drv, cty)) + geom_jitter(aes(drv, cty_filtered))