0
votes

I have generated box plot for large dataset where I am showing impact of genotype on splicing ratios, as a result, I got box plot with many outliers due to which the size of box plots squeezed, I can ignore outliers using (outlier.colour = NA) but when I try to reset ylim using scale_y_continuous(limits = c(lower, upper)), it changes the whole dynamics, Can someone please help me in changing height of boxplots so that I can see the change clearly.

This post is relevant but I wasn't able to fix this problem.

Ignore outliers in ggplot2 boxplot

I have used this code to plot:

Trans <- read.delim("EXAMPLES/AT1G04170_SR_2", header=TRUE, 
sep="\t")
Trans_1 <- 
ggplot(data=Trans,mapping=aes(x=Genotype,y=Ratio,fill=Isoforms)) 
+geom_boxplot(outlier.colour = NA) 

Data

   sample   Isoforms    Ratio   Genotype

    108 AT1G04170_JC4   0.114555061397559   CC

    139 AT1G04170_JC4   1.43188141139633E-07    CC

    159 AT1G04170_JC4   0.974829214147311   CT

    108 AT1G04170_P1    0.885444938602441   CC

    139 AT1G04170_P1    0.980915433730349   CC

    159 AT1G04170_P1    0.025170785852689   CT

    108 AT1G04170_P2    0   CC

    139 AT1G04170_P2    0   CC

    159 AT1G04170_P2    0   CT

    108 AT1G04170_c1    0   CC

    139 AT1G04170_c1    0.01908442308151    CC

    159 AT1G04170_c1    0   CT

I want the boxplots inside the ggplot to be less squeezed so that I can see the colors and properly.

current image: https://ibb.co/S3gS2KR

1
instead of manipulating/tweaking the boxplots, I would recommend to try another sale, i.g. scale_y_log10(). In addition try to plot the features on the x-axis instead of the CC/CT.Roman

1 Answers

0
votes

First - Not sure why you're seeing so many outliers. When I run your code, I see none.

Second - It's not an outlier issue but rather a scaling issue. That is, your sample variability is small compared to your y minimum and maximum. You can make the graph bigger. If you're using RStudio, you can do this is in the code chunk header as such:

```{r, fig.height=8}    
ggplot(data=Trans,mapping=aes(x=Genotype,y=Ratio,fill=Isoforms)) + 
  geom_boxplot(outlier.colour = NA) 

```

Third - You won't be able to make the 5 boxes on the right bigger because all of those values are the same, ie all values = 0.

EDIT: Looking at your data closer, those values aren't all 0 but goes back to the underlying issue, the values are so close together compared to your y-min and y-max.