2
votes

I'm trying to create a snakemake pipeline whose outputs are determined by the set of sequencing files present in a particular folder. The structure of my filepath here is something like:

project_dir
>    Snakefile
>    code
>        python_scripts
>            ab1_to_fastq.py
>    data
>        1.ab1_files
>            A.ab1
>            B.ab1
>            C.ab1
>        2.fastq_files

Here's the code for my actual Snakefile

import glob
import os

def collect_reads():
    ab1_files = glob.glob("data/1.ab1_files/*.ab1")
    ab1_files.sort()
    ab1_reads = [ab1_file.split('/')[-1].replace('.ab1', '') for ab1_file in ab1_files]
    return ab1_reads

READS = collect_reads()
print(expand("data/2.fastq_files/{read}.fastq", read=READS))

rule convert_ab1_to_fastq:
    input:
        ab1="data/1.ab1_files/{read}.ab1"
    output:
        fastq="data/2.fastq_files/{read}.fastq"
    shell:
        "python code/python_scripts/ab1_to_fastq.py --ab1 {input.ab1} --fastq {output.fastq}"

rule all:
    input:
        fastq=expand("data/2.fastq_files/{read}.fastq", read=READS)

My understanding is that all should be my target rule, and that the input variable of fastq in that rule evaluates to

['data/2.fastq_files/A.fastq', 'data/2.fastq_files/B.fastq', 'data/2.fastq_files/C.fastq']

And this seems to be confirmed by the print output in the pipeline when I run my script. However, I get the error WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards. whenever I run this script.

Strangely, I can copy one of the paths from the list generated by expand to call snakemake directly, e.g. snakemake data/2.fastq_files/A.fastq and the pipeline completes successfully.

What am I missing?

1
Unrelated to your question, but seeing some parts of your code, you might be interested in os.path.basename, os.path.splitext and possibly other functions provided in os.path (docs.python.org/3/library/os.path.html). - bli
@bli will check those out, I've got scripts littered with similar code to break down filepaths & filenames - Thomas Moody

1 Answers

1
votes

It could be that snakemake thinks your target rule is convert_ab1_to_fastq and not all. By default, snakemake takes the first rule as target rule. Declare all first, and see whether this solves your problem.