0
votes

I have a rule that produces files with sample names in every line of the file:

FAMILY = ['fam1','fam2','fam3']

rule extract_individuals:
    input:
        vcf = 'muscle/{family}.vcf.gz'
    output:
        vcf_ind = 'output/{family}.txt'
    shell:
        'bcftools query -l {input.vcf} -o {output.vcf_ind}' 

This rule will produce files with individual names:

sample1
sample2
sample3

I want to have another rule that for every line of these files, uses the string representation of the line as the wildcard output of another rule; for example:

rule get_samples:
    input: 'output/{family}.txt'
    output: 'output/{individual}.vcf.gz'
    shell: 'python -c "for line in {input} print(line)" |  xargs -I {{}} bcftools view -O z -s {{}} -o {ouput} {input}'

Note that snakemake complains here that the output from input files cannot be determined from output files.

Not following where individual comes from in the line, output: 'output/{individual}.vcf.gz' given what you've shared in your post so far? - Wayne
individual should be sample1, and also sample2 etc... I want to basically substitute individual by the lines in the file that rule extract_individuals produces - moth
I guess you'd have access to the contents listing sample1, sample2, etc., at the outset of running snakmake? What I'd do is then use Python at the start of my snakefile to make a list of the file names, say the list is named indiviudal_vcfs for example, and then for rule get_samples I'd have output as output: indiviudal_vcfs. I'd be interested in what others do in such a case. - Wayne