How to remove extreme outliers in R?

Question

I have an R script that uses a csv file as it's source data to create sixteen separate boxplots. Each of the sixteen boxplots have varying y-axis scales, which makes it difficult to apply a general ylim statment to the script. I tried using the coor_cartesian function with the ylim statement as well as the scale_y_continuous function, but again, that was too general to apply across sixteen boxplots with varying y-axis scales (I do not want to normalize the scales across the sixteen boxplots, only plots with 'extreme' outliers).

Below is the snipet of data I used to create the sixteen box plots. 'SE_Data' is the csv source file I noted above. I should also mention that the sixteen boxplots are exported as a single pdf file (I don't know if this level of detail is needed or not).

# Enter csv input file:
SE_Data<-read.csv("SE_DATA.csv",header=T)

# Enter output file name:
pdf(file="SE_Box_Plots.pdf", onefile=TRUE)

x=c("A","B","C","D","E","F","G","H")

SE_Data$ACO_Desc <- factor(SE_Data$ACO_Desc , x) #Ensures x-axis is ordered from A through H

#Creates sixteen individual boxplots 
for (i in 5:ncol(SE_Data)) { 

  p<-ggplot(SE_Data, aes(x=Group_Desc, y=SE_Data[,i])) + geom_boxplot() +
  ylab(gsub("\\_", " ", colnames(SE_Data)[i])) + 
  xlab("") +
  theme(axis.text.x=element_text(angle = 0))
  print(p)

}

dev.off()  

dev.list()

I wasn't sure if I would need to create an IF ELSE statment to solve this problem, however, as a someone who is still fairly new to R, this appears to be well above my skill level. Below, I included two of the sixteen boxplots to illustrate how their y-axis scales differ from eachother.

Box Plot 1:

Box Plot 2:

As you can see from the two boxplots, they both have very different y-axis scales. In my opinion 'boxplot 2' looks fine, however, 'boxplot 1' contains extreme outliers. I would to develop a piece of code that could remove these extreme values in order to reduce the amount of 'dead space' on the boxplot; thus, lowering the scale of the y-axis and making it more appealing to the eye.

It's important to stress that I still want outliers to be included in my boxplots, however, I want to remove only the extreme outliers. If you need any more information from my end please be sure to let me know.

Thanks in advance for your help, its greatly appreciated.

Chris

r.bot r.bot · Accepted Answer · 2015-01-13T16:54:54

I can't reproduce your graph without the data but including

geom_boxplot( outlier.shape=NA )

should hide the outliers. You can manually adjust the yscale with

scale_y_continuous(limits=c(-5, 1)) # or whatever values you want to use.

How to remove extreme outliers in R?

1 Answers