2
votes

I am using snakemake in a workflow for NGS analyses. In one rule, I make use of the unique (temporary) output from another rule.The output of this one rule is also unique and contributes to the creation of the final output. A simple wildcard {sample} is used over these rules. I do not see any cyclic dependency, but snakemake tells me there is:

CyclicGraphException in line xxx of Snakefile: Cyclic dependency on rule

I understand that there is an option to investigate this problem: --debug-dag.

How do I interpret the output? What is candidate versus selected?

This my (pseudo-) code of the rule:

rule split_fasta:
    input:
        dataFile="data/path1/{sample}.tab",
        scaffolds="data/path2/{sample}.fasta",
        database="path/to/db",
    output:
        onefasta="data/path2/{sample}_one.fasta",
        twofasta="data/path2/{sample}_two.fasta",
        threefasta="data/path2/{sample}_three.fasta",
    conda:
        "envs/env.yaml"
    log:
        "logs/split_fasta_{sample}.log"
    benchmark:
        "logs/benchmark/split_fasta_{sample}.txt"
    threads: 4
    shell:
        """
python bin/split_fasta.py {input.dataFile} {input.scaffolds} {input.database} {output.onefasta} {output.twofasta} {output.threefasta} 
        """

There is no other connection between input and output than in this rule.

The problem is solved now, further downstream and upstream some subtle dependencies were present.

But, for future reference I would like to know how to interpret the output od the --debug-dag option.

1
I cannot answer your question, but have you tried to constrain the wildcards? That often helps me reduce cyclic dependencies :) snakemake.readthedocs.io/en/stable/snakefiles/… - The Unfun Cat
The wildcard is just {sample}, the unique sample number, as used in the other rules. I cannot see how constraining would help. - Thierry Janssens

1 Answers

1
votes
--debug-dag    Print candidate and selected jobs (including their wildcards) while inferring DAG. This can help to debug unexpected DAG topology or errors.

It does not seem to have further documentation than this, but I believe the candidate jobs are the jobs that can be made matching to the required string through wildcards. The selected job is the one that is chosen from the candidates (either through wildcard constraints, ruleorder, or the first candidate with the option --allow-ambiguity).

As an example I have a rule that does adapter trimming, and I have a rule for both paired end and single end:

rule trim_SE:
    input:
        "{sample}.fastq.gz"
    output:
        "{sample}_trimmed.fastq.gz"
    shell:
         ...

rule trim_PE:
    input:
        "{sample}_R1.fastq.gz",
        "{sample}_R2.fastq.gz"
    output:
        "{sample}_R1_trimmed.fastq.gz"
        "{sample}_R2_trimmed.fastq.gz"
    shell:
         ...

If I now tell snakemake to generate the output exp_R1_trimmed.fastq.gz it complains that it can use either rule.

AmbiguousRuleException:
Rules trim_PE and trim_SE are ambiguous for the file exp_R1_trimmed.fastq.gz.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
    trim_PE: sample=exp
    trim_SE: sample=exp_R1

we can solve this problem by for instance placing a ruleorder:

ruleorder: trim_PE > trim_SE

And the file gets generated as we want. If we now use the --debug-dag option we get two candidate rules, and one selected rule (based on our ruleorder).

candidate job trim_PE
    wildcards: sample=exp
candidate job trim_SE
    wildcards: sample=exp_R1
selected job sra2fastq_PE
    wildcards: sample=GSM2837484

If the rule trim_PE and trim_SE depended on other rules downstream, we can use the --debug-dag option to detect in which rule the wildcard expansion goes wrong, instead of only getting an error in the rule where it goes wrong.