I have a rule that produces files with sample names in every line of the file:
FAMILY = ['fam1','fam2','fam3']
rule extract_individuals:
input:
vcf = 'muscle/{family}.vcf.gz'
output:
vcf_ind = 'output/{family}.txt'
shell:
'bcftools query -l {input.vcf} -o {output.vcf_ind}'
This rule will produce files with individual names:
sample1
sample2
sample3
I want to have another rule that for every line of these files, uses the string representation of the line as the wildcard output of another rule; for example:
rule get_samples:
input: 'output/{family}.txt'
output: 'output/{individual}.vcf.gz'
shell: 'python -c "for line in {input} print(line)" | xargs -I {{}} bcftools view -O z -s {{}} -o {ouput} {input}'
Note that snakemake complains here that the output from input files cannot be determined from output files.
individual
comes from in the line,output: 'output/{individual}.vcf.gz'
given what you've shared in your post so far? - Wayneindividual
should besample1
, and alsosample2
etc... I want to basically substituteindividual
by the lines in the file thatrule extract_individuals
produces - mothsample1
,sample2
, etc., at the outset of running snakmake? What I'd do is then use Python at the start of my snakefile to make a list of the file names, say the list is namedindiviudal_vcfs
for example, and then for ruleget_samples
I'd haveoutput
asoutput: indiviudal_vcfs
. I'd be interested in what others do in such a case. - Wayne