I'm having trouble accessing nested values from my config.yaml file. My config.yaml:
method:
collibri:
- Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R
- Collibri_standard_protocol-HBR-Collibri-100_ng-3_S2_L001_R
- Collibri_standard_protocol-UHRR-Collibri-100_ng-2_S3_L001_R
- Collibri_standard_protocol-UHRR-Collibri-100_ng-3_S4_L001_R
kapa:
- KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R
- KAPA_mRNA_HyperPrep_-HBR-KAPA-100_ng_total_RNA-2_S5_L001_R
- KAPA_mRNA_HyperPrep_-HBR-KAPA-100_ng_total_RNA-3_S6_L001_R
- KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-2_S7_L001_R
num:
- 1
- 2
type:
- collibri
- kapa
And my goal is to call all files from method groups as inputs at once and direct output to folder which would have the method name on it (e.g. run rule using all names under 'kapa' at once and place the output in 'kapa' folder). Shortened version of my Snakefile:
configfile: "config.yaml"
rule all:
input:
expand("outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam.bai", filename=config["method"]["collibri"]),
expand("outputs/STAR/{filename}/counts_2.txt", filename=config["method"]["collibri"]),
expand("outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam.bai", filename=config["method"]["kapa"]),
expand("outputs/STAR/{filename}/counts_2.txt", filename=config["method"]["kapa"]),
expand("outputs/STAR/{type}/counts_2.txt", type=config["type"])
rule bam_index:
input:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam"
output:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam.bai"
shell:
"samtools index {input}"
rule bam_sort_name:
input:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam"
output:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.sortedbyname.bam"
shell:
"samtools sort -n -o {output} {input}"
rule feature_counts:
input:
bam="outputs/STAR/{filename}/Aligned.sortedByCoord.out.sortedbyname.bam",
gtf="data/chr19_20Mb.gtf"
output:
out1="outputs/STAR/{filename}/counts_1.txt",
out2="outputs/STAR/{filename}/counts_2.txt"
shell:
"featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out1} -s 1 {input.bam} && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out2} -s 2 {input.bam}"
rule feature_counts_per_sample:
input:
bam=expand("outputs/STAR/{name}/Aligned.sortedByCoord.out.sortedbyname.bam", name=config["method"][{type}]),
gtf="data/chr19_20Mb.gtf"
output:
out1="outputs/STAR/{type}/counts_1.txt",
out2="outputs/STAR/{type}/counts_2.txt"
shell:
"mkdir -p outputs/STAR/{type}/ && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out1} -s 1 {input.bam} && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out2} -s 2 {input.bam}"
So overall there are two issues that I cannot solve:
- Is there a way for me to call all list items under 'method' so I don't have to define the same output in rule_all twice with different config extensions (filename=config["method"]["collibri"] and filename=config["method"]["kapa"], for rules rule bam_index and rule feature_counts)?
- The rule 'feature_counts_per_sample' does not work (ofc), but this was my latest attempt at using variables 'collibri' and 'kapa' in one place and expanding them to list of filenames that need to be passed as inputs at the same time in another place. Any advise here?