I'm trying to write a snakemake pipeline to process samples that are divided into groups which requires a single summary file per group. For examples, samples are divided as follows:
Group 1:
Sample 1
Sample 2
Group 2:
Sample 3
Sample 4
Each sample is process using bedtools to generate an output file per sample. Next, I need to summarize at the group level for each Group's collection of samples. An abbreviated snakemake file looks like this:
rule intersect:
input:
bam = join('{group}','{sample}.bam'),
reg_bed = join('{group}', 'region.bed')
output:
reg_intersect = join('{group}', '{sample}.intersect.bed')
shell:
'bedtools intersect -wa -wb -a {input.reg_bed} -b {input.bam} > {output.reg_intersect}'
rule aggregate:
input:
rules.interesect.output
output:
join('{group}','summary_stats.csv')
shell:
touch(join('{group}','summary_stats.csv'))
#this would be a call to a python function that operates on all the output files to generate a single file
I get complaints that wildcards not agreeing between input and output (input contains {group} and {sample} while output only has {group}. I've tried using expand() but I have to include values for both group and sample, and sample isn't possible because it is dependent upon group.
Any suggestions are welcome.