Snakemake doesn't read full file in R?

Question

I'm using some quick R script to cbind files after performing quantification (kallisto/salmon). The problem is, I get an R error, saying my input files aren't the same length so cbind() won't work.
This is obviously not the case, they all are 16887 lines (checked with bash wc), and it works perfectly fine in R without snakemake.

Also worth mentioning, I do get an output for a random number of sample (~ 1 to 4).

Here's the R code :

sample <- read.table(snakemake@input[[1]], sep = "\t", header = TRUE)
if (!file.exists(snakemake@params[[1]])) {
  merge <- data.frame(sample)
} else {
  merge1 <- read.table(snakemake@params[[1]], sep = "\t", header = TRUE)
  merge <- cbind(merge1, sample[,2, drop=F])
}
write.table(merge, snakemake@params[[1]], sep = "\t", quote = F, row.names = F)
file.create(snakemake@output[[1]])

And the snakemake rule :

rule merge_quantif:
        input:
            QUANTIF+"/{sample}/quantif.txt"
        output:
            QUANTIF+"/{sample}/merge.done"
        params:
            QUANTIF+"/all_sample_quantified.txt"
        script:
            "Tools/merge_quantif.R"

My files are like this :

Gene    R1
A1BG    0.287571
A1CF    0
A2M 0.198756
A2ML1   0
A2MP1   0
A3GALT2 0
A4GALT  3.098108

And the output should be like this, but with all 17 samples

Gene    R4  R8  R15 R13
A1BG    0.337515    0.284943    0.488654    0.587114
A1CF    0   0   0   0
A2M 0   0   0.105159    0.009539

If anyone has an idea to sort this out, will be gladly appreciated.

------------ EDIT Modified R code to go with the asnwer:

sample <- snakemake@input

merge <- read.table(sample[[1]], sep = "\t", header = T)
for (i in 2:length(sample)) {
  x <- read.table(sample[[i]], sep = "\t", header = T)
  merge <- merge(merge, x, by="Gene")
}

write.table(merge, snakemake@output[[1]], sep = "\t", quote = F, row.names = F)

How is this obviously not the case? Does it work in just R without snakemake? — Maarten-vd-Sande
Yes indeed, forgot to say it I manually check all 17 samples and it works perfectly fine in R without snakemake — Exenter
Unrelated, instead of cbind I would use the merge function with "Gene" as the column to merge on - just in case an input file is not sorted like all the others or doesn't have the same genes. — dariober

dariober dariober · Accepted Answer · 2020-04-21T10:40:18

The problem, I think, is that rule merge_quantif is executed for each sample, i.e. 17 times, possibily in parallel. However, each run of merge_quantif writes to the same output file (QUANTIF+"/all_sample_quantified.txt") since in your R script you have write.table(merge, snakemake@params[[1]], ...). I suspect this causes problems or at least is not an ideal setup. I suspect what you want is something like:

rule merge_quantif:
        input:
            expand(QUANTIF+"/{sample}/quantif.txt", sample= list_of_samples)
        output:
            QUANTIF+"/all_sample_quantified.txt"
        script:
            "Tools/merge_quantif.R"

Where Tools/merge_quantif.R reads the list of input files one by one, merges them and finally writes the merged file to QUANTIF+"/all_sample_quantified.txt"

Snakemake doesn't read full file in R?

1 Answers