2
votes

The overall problem I'm trying to solve is a way to count the number of reads present in each file at every step of a QC pipeline I'm building. I have a shell script I've used in the past which takes in a directory and outputs the number of reads per file. Since I'm looking to use a directory as input, I tried following the format laid out by Rasmus in this post:

https://bitbucket.org/snakemake/snakemake/issues/961/rule-with-folder-as-input-and-output

Here is some example input created earlier in the pipeline:

$ ls -1 cut_reads/
97_R1_cut.fastq.gz
97_R2_cut.fastq.gz
98_R1_cut.fastq.gz
98_R2_cut.fastq.gz
99_R1_cut.fastq.gz
99_R2_cut.fastq.gz

And a simplified Snakefile to first aggregate all reads by creating symlinks in a new directory, and then use that directory as input for the read counting shell script:

import os

configfile: "config.yaml"

rule all:
    input:
        "read_counts/read_counts.txt"

rule agg_count:
    input:
        cut_reads = expand("cut_reads/{sample}_{rdir}_cut.fastq.gz", rdir=["R1", "R2"], sample=config["forward_reads"])
    output:
        cut_dir = directory("read_counts/cut_reads")
    run:
        os.makedir(output.cut_dir)
        for read in input.cut_reads:
            abspath = os.path.abspath(read)       
            shell("ln -s {abspath} {output.cut_dir}")

 rule count_reads:
    input:
        cut_reads = "read_counts/cut_reads"
    output:
        "read_counts/read_counts.txt"
    shell:
        '''
        readcounts.sh {input.cut_reads} >> {output}
        '''

Everything's fine in the dry-run, but when I try to actually execute it, I get a fairly cryptic error message:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   agg_count
    1   all
    1   count_reads
    3

[Tue Jun 18 11:31:22 2019]
rule agg_count:
    input: cut_reads/99_R1_cut.fastq.gz, cut_reads/98_R1_cut.fastq.gz, cut_reads/97_R1_cut.fastq.gz, cut_reads/99_R2_cut.fastq.gz, cut_reads/98_R2_cut.fastq.gz, cut_reads/97_R2_cut.fastq.gz
output: read_counts/cut_reads
    jobid: 2

 Job counts:
    count   jobs
    1   agg_count
    1
[Tue Jun 18 11:31:22 2019]
Error in rule agg_count:
    jobid: 0
    output: read_counts/cut_reads

Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/douglas/snakemake/scrap_directory/.snakemake/log/2019-06-18T113122.202962.snakemake.log

read_counts/ was created, but there's no cut_reads/ directory inside. No other error messages are present in the complete log. Anyone know what's going wrong or how to receive a more descriptive error message?

I'm also (obviously) fairly new to snakemake, so there might be a better way to go about this whole process. Any help is much appreciated!

3

3 Answers

1
votes

... And it was a typo. Typical. os.makedir(output.cut_dir) should be os.makedirs(output.cut_dir). I'm still really curious why snakemake isn't displaying the AttributeError python throws when you try to run this:

AttributeError: module 'os' has no attribute 'makedir'

Is there somewhere this is stored or can be accessed to prevent future headaches?

1
votes

Are you sure the error message is due to the typo in os.makedir? In this test script os.makedir does throw AttributeError ...:

rule all:
    input: 
        'tmp.done',

rule one:
    output: 
        x= 'tmp.done',
        xdir= directory('tmp'),
    run:
        os.makedir(output.xdir)

When executed:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1   one
    2

[Wed Jun 19 09:05:57 2019]
rule one:
    output: tmp.done, tmp
    jobid: 1

Job counts:
    count   jobs
    1   one
    1
[Wed Jun 19 09:05:57 2019]
Error in rule one:
    jobid: 0
    output: tmp.done, tmp

RuleException:
AttributeError in line 10 of /home/dario/Tritume/Snakefile:
module 'os' has no attribute 'makedir'
  File "/home/dario/Tritume/Snakefile", line 10, in __rule_one
  File "/home/dario/miniconda3/envs/tritume/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/dario/Tritume/.snakemake/log/2019-06-19T090557.113876.snakemake.log
0
votes

Use f-string to resolve local variables like {abspath}:

        for read in input.cut_reads:
            abspath = os.path.abspath(read)       
            shell(f"ln -s {abspath} {output.cut_dir}")

Wrap the wildcards that snakemake resolves automatically into double braces inside of f-strings.