1
votes

I'm looking for a way to define a target pattern rule (without input pattern) in Snakemake.

In this case, I want a rule that creates the files a, b and c as part of a pattern on the target file where the input does not contain a pattern.

In GNU make, I would do it like this:

.PHONY: all
all: a b c

%:
    echo x > $@

However, if I do the following in Snakemake:

rule test:
    output:
        "{filename}"
    wildcard_constraints:
        filename = "[abc]"
    shell:
        "echo x > {filename}"

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.

I can of course specify the output using an expand() call, but that would imply that the rule creates all files when called once, which is not the case. Rather, it should create one file when executing shell with a certain argument ({filename} in the example here).

2
I'm not sure I understand your GNU Make code. The target pattern you are trying to exemplify is "all: a b c" where all is the target and "a b c" is the pattern yes?TBoyarski
I'm looking for a rule to generate a single file whose name contains a wild card when the input does not have a wildcard pattern. That's exactly what the Makefile says: construct a pattern from no pattern. In this particular case, the rule can be used to create files with any name and content x. The target files are a, b and c, which are all created by calling the pattern rule three times.Michael Schubert

2 Answers

4
votes

Tims answer below basically contains the solution. But since you have asked me on Twitter, let me directly translate your Makefile:

rule all:
    input:
        expand("{filename}", filename=["a", "b", "c"])

rule test:
    output:
        "{filename}"
    wildcard_constraints:
        filename = "[abc]"
    shell:
        "echo x > {filename}"

All these basics are explained in the Snakemake tutorial.

1
votes

The issue is {input} versus {wildcards.namedVar} access in the shell directive. See here in the documentation. With that said, I do not see your driver call for the Snakemake setup, which I would also recommend. (I've added it in my answer below). It would be equivalent to the .PHONY and all rule pattern (the messy convention that GNU Make forced us into).

In your shell directive, the variable {filename} is accessible as an attribute of the wildcard object. You need to use python dot notation to access it, like {wildcards.filename}. With that said, the better way would be to directly access the input wildcard object because it actually has built in toString conversion, since it carries only a single list of strings (where as the wildcard object can contain multiple individual wildcard attributes, so the behaviour is not predictable).

You can ignore the ".snk" suffix, I just think it's nice for Snakemake rule files. In code, this is what I mean:

test.snk

 rule test:
     output:
         "{filename}"
     wildcard_constraints:
         filename = "[abc]"
     shell:
         "echo x > {wildcards.filename}"

In identical fashion, you can also do this, test.snk:

 rule test:
     output:
         "{filename}"
     wildcard_constraints:
         filename = "[abc]"
     shell:
         "echo x > {output}"

Recommended Code Base:

test1.snk:

 rule test:
     output:
         "{filename}"
     wildcard_constraints:
         filename = "[abc]"
     shell:
         "echo x > {output}"

Snakefile:

 configfile: "config.yaml"

 rule all:
     input:
         expand("{sample}", sample=config["fileName"])

 include: "test1.snk"

config.yaml

fileName: ['a','b','c']

$snakemake -n:

 rule test:
     output: a
     jobid: 1
     wildcards: filename=a


 rule test:
     output: c
     jobid: 2
     wildcards: filename=c


 rule test:
     output: b
     jobid: 3
     wildcards: filename=b


 localrule all:
     input: a, b, c
     jobid: 0

 Job counts:
     count   jobs
     1   all
     3   test
     4

Additional info

Also, this setup scales VERY well :) Run it just using the CLI call Snakemake, absent of any arguments. Like:

 snakemake

Although this is terrible practice, technically it's also possible if you are more "outcome" oriented, and don't care about reproducibility.

 snakemake -n -s "test1.snk" a b c

That will essentially target just rule "test1.snk" and request from it "a", "b", and "c".

rule test:
    output: c
    jobid: 0
    wildcards: filename=c


rule test:
    output: b
    jobid: 1
    wildcards: filename=b


rule test:
    output: a
    jobid: 2
    wildcards: filename=a

Job counts:
        count   jobs
        3       test
        3

You can see the dry-run call is actually different, as it is not accessing the "rule all", as a result, there is no 4th job. Overall the processing by Snakemake is usually trivial to the processing performed by shell commands. With out without an "all" rule I would expect very little difference in performance. Yet, with the all rule, it's infinitely clearer what your code is suppose to be doing, and you can easily re-run the exact same command without having to 'grep' your 'history'.