0
votes

TL;DR: I would like to generate a markdown with the plot of normalized count for a list of gene. As this list is quite long (> 100 genes), I would like to generate a "grid" of 4x4 graph on one figure page for the first 16 genes, then the same for genes 17 to 32, etc...until the end of the list. Currently, my code is only displaying 16 gene, even tho I run the grid.arrange command INSIDE the loop (it worked when I used the "plot" fonction inside the loop but it displays only one graph per page ofc).

I'm currently doing some RNAseq analysis and I'm looking for differentially expressed gene (DEGs) between two population. To have a more visual representation of DEGs, I would like to plot the normalized count for each population for some genes of interest (GoI).

However the list of these GOI can be quite long (for ex., if I'm focusing on DEG that are coding for membrane protein, I've some 159 candidates). I'm able to plot them using a for loop, with the following code (from a first analysis):

# top gene contains all of Gene of Interest
# group_origin contains the factor used to discriminate cell population to compare
# dds_g is a dds object from DESeq2 and contained counts number I want to plot

for (i in unique(1:length(top_gene))){
  gene <- top_gene[i]
  d <- plotCounts(dds_g, gene = gene, intgroup = "group_origin", returnData = TRUE)
  b <- ggplot(d, aes(x = group_origin, y = count)) +
    stat_boxplot(geom = 'errorbar', aes(colour = factor(group_origin)), width = 0.2) +
    geom_boxplot(aes(colour = factor(group_origin)), width = 0.2) +
    stat_summary(fun.y=mean, geom="point", shape=17, size=1.5, aes(color=factor(group_origin), fill=factor(group_origin))) +
    labs (title = paste0(resg_cb_db$symbol[gene],' (',gene,')'), x = element_blank()) +
    theme_bw() +
    scale_color_manual(values = mycolors) +
    theme(text= element_text(size=10),
        axis.text.x = element_text(size = 7.5, angle = 45, hjust = 1),
        axis.text.y = element_text(size = 10),
        legend.position = "none")
    
  plot(b)
}

By using this approach, I'm able to generate (separately) the plot for all the gene ID contained in my "top_gene" vector.

However, the markdown is creating one plot per page which is annoying when you have high number of genes. I would like to group them, e.g., every genes 1 to 16 in the list are plotted together on one page, then its 16 to 32, etc....

I've tried (without sucess) par mfrow. I also tried Grid.extra with the following code (from another analysis example):

for (i in unique(1:length(top_gene))){
  gene <- top_gene[i]
  d <- plotCounts(dds, gene = gene, intgroup = "cell_type", returnData = TRUE)
  b <- ggplot(d, aes(x = cell_type, y = count)) +
    stat_boxplot(geom = 'errorbar', aes(colour = factor(cell_type)), width = 0.2) +
    geom_boxplot(aes(colour = factor(cell_type)), width = 0.2) +
    stat_summary(fun.y=mean, geom="point", shape=17, size=1.5, aes(color=factor(cell_type), fill=factor(cell_type))) +
    labs (title = paste0(resg$symbol[gene],' (',gene,')'), x = element_blank()) +
    theme_bw() +
    scale_color_manual(values = mycolors) +
    theme(text= element_text(size=7),
        axis.text.x = element_text(size = 7, angle = 45, hjust = 1),
        axis.text.y = element_text(size = 6),
        legend.position = "none")
    
  plot_list[[gene]] <- b
}

t <- length(plot_list)
x <- seq(1, t, by = 15)
for (i in x){
  z <- i+1
  if (!is.na(x[z])) {
    
    test_list <- plot_list[c(x[i]:x[z])]
    do.call(grid.arrange,test_list)
    }
    else {
    z <- length(plot_list)  
    test_list <- plot_list[c(x[i]:z)]  
    do.call(grid.arrange,test_list)
  }
}

Which give me an error "Error in x[i]:z : argument NA / NaN"

The idea here was to, for every 16 genes, plot a 4x4 graph page. Of course, when it reach the last "16 group", there are less than 16 genes remaining (unless you have total number of genes that can be divided by 16). So that's why the if loop is there to prevent the error (but it's not working).

Also, I've tried to remove this last part and just try to generate the first "9 x 16 genes" of my list, discarding the last ones. It "works" because I can clearly see the first 16 genes ploted in 4 x 4 but nothing about the rest.

Why plotting inside a for loop using "plot(b)" is working but not using grid.arrange on a list() created inside a for loop too?

Very sorry for my code, I know it's not perfect (I'm still learning all of this) but I hope it's clear enough for you to understand my question...

Thanks!

!EDIT! : solved the first error by adding (i in length(x)). Feel stupid :D. Anyway, it's still only "printing" 16 plots instead of 159...

1

1 Answers

0
votes

I think you have some confusion in your for loop. When you say for (i in x) each iteration you get the values in x (1,16,31,46,...) not the index number so when you set z <- i + 1 you get the values (2,17,31,...). This makes your x[i] and x[z] values NA for most values of i and z.

The for loop below should fix the indexing and prevent you from going outside the length of plot_list in your edge case.

for (i in 1:length(x)){ #replaced this line
  z <- i+1
  if (!is.na(x[z])) {
    # changed this to make sure x[z] doesn't show up on two plots
    test_list <- plot_list[x[i]:(x[z]-1)]
    do.call(grid.arrange,test_list)
    }
    else {
    z <- length(plot_list)  
    test_list <- plot_list[x[i]:z]
    do.call(grid.arrange,test_list)
  }
}