I'm using some quick R script to cbind files after performing quantification (kallisto/salmon).
The problem is, I get an R error, saying my input files aren't the same length so cbind() won't work.
This is obviously not the case, they all are 16887 lines (checked with bash wc), and it works perfectly fine in R without snakemake.
Also worth mentioning, I do get an output for a random number of sample (~ 1 to 4).
Here's the R code :
sample <- read.table(snakemake@input[[1]], sep = "\t", header = TRUE)
if (!file.exists(snakemake@params[[1]])) {
merge <- data.frame(sample)
} else {
merge1 <- read.table(snakemake@params[[1]], sep = "\t", header = TRUE)
merge <- cbind(merge1, sample[,2, drop=F])
}
write.table(merge, snakemake@params[[1]], sep = "\t", quote = F, row.names = F)
file.create(snakemake@output[[1]])
And the snakemake rule :
rule merge_quantif:
input:
QUANTIF+"/{sample}/quantif.txt"
output:
QUANTIF+"/{sample}/merge.done"
params:
QUANTIF+"/all_sample_quantified.txt"
script:
"Tools/merge_quantif.R"
My files are like this :
Gene R1
A1BG 0.287571
A1CF 0
A2M 0.198756
A2ML1 0
A2MP1 0
A3GALT2 0
A4GALT 3.098108
And the output should be like this, but with all 17 samples
Gene R4 R8 R15 R13
A1BG 0.337515 0.284943 0.488654 0.587114
A1CF 0 0 0 0
A2M 0 0 0.105159 0.009539
If anyone has an idea to sort this out, will be gladly appreciated.
------------ EDIT Modified R code to go with the asnwer:
sample <- snakemake@input
merge <- read.table(sample[[1]], sep = "\t", header = T)
for (i in 2:length(sample)) {
x <- read.table(sample[[i]], sep = "\t", header = T)
merge <- merge(merge, x, by="Gene")
}
write.table(merge, snakemake@output[[1]], sep = "\t", quote = F, row.names = F)
cbind
I would use themerge
function with "Gene" as the column to merge on - just in case an input file is not sorted like all the others or doesn't have the same genes. – dariober