I have a quick question regarding the use of dynamic wildcards. I have searched the documentation and forums, but have not found a straightforward answer to my query.
Here are the rules that are giving me trouble:
rule all:
input: dynamic("carvemeOut/{species}.xml")
shell:"snakemake --dag | dot -Tpng > pipemap.png"
rule speciesProt:
input:"evaluation-output/clustering_gt1000_scg.tab"
output: dynamic("carvemeOut/{species}.txt")
shell:
"""
cd {config[paths][concoct_run]}
mkdir -p {config[speciesProt_params][dir]}
cp {input} {config[paths][concoct_run]}/{config[speciesProt_params][dir]}
cd {config[speciesProt_params][dir]}
sed -i '1d' {config[speciesProt_params][infile]} #removes first row
awk '{{print $2}}' {config[speciesProt_params][infile]} > allspecies.txt #extracts node information
sed '/^>/ s/ .*//' {config[speciesProt_params][metaFASTA]} > {config[speciesProt_params][metaFASTAcleanID]} #removes annotation to protein ID
Rscript {config[speciesProt_params][scriptdir]}multiFASTA2speciesFASTA.R
sed -i 's/"//g' species*
sed -i '/k99/s/^/>/' species*
sed -i 's/{config[speciesProt_params][tab]}/{config[speciesProt_params][newline]}/' species*
cd {config[paths][concoct_run]}
mkdir -p {config[carveme_params][dir]}
cp {config[paths][concoct_run]}/{config[speciesProt_params][dir]}/species* {config[carveme_params][dir]}
cd {config[carveme_params][dir]}
find . -name "species*" -size -{config[carveme_params][cutoff]} -delete #delete files with little information, these cause trouble
"""
rule carveme:
input: dynamic("carvemeOut/{species}.txt")
output: dynamic("carvemeOut/{species}.xml")
shell:
"""
set +u;source activate concoct_env;set -u
cd {config[carveme_params][dir]}
echo {input}
echo {output}
carve $(basename {input})
"""
I was previously using two different widlcards for the input and output of the carveme rule:
input: dynamic("carvemeOut/{species}.txt")
output: dynamic("carvemeOut/{gem}.xml")
What I want snakemake to do is to run the carveme rule multiple times, to create an output .xml file for each input .txt file. However, snakemake is instead running the rule one time, using a list of inputs to create one output, as can be seen below:
rule carveme:
input: carvemeOut/species2.txt, carvemeOut/species5.txt, carvemeOut/species1.txt, carvemeOut/species10.txt, carvemeOut/species4.txt, carvemeOut/species17.txt, carvemeOut/species13.txt, carvemeOut/species8.txt, carvemeOut/species14.txt
output: {*}.xml (dynamic)
jobid: 28
After modifying my rules to use the same wildcard, as suggested by @stovfl and shown in the first code box, I get the following error message:
$ snakemake all
Building DAG of jobs...
WildcardError in line 174 of /c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/snakemake-concot/Snakefile:
Wildcards in input files cannot be determined from output files:
species
Any suggestions on how to address this problem?
Thanks in advance, FZ
$ snakemake all
Building DAG of jobs...
WildcardError in line 174 of /c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/snakemake-concot/Snakefile:
Wildcards in input files cannot be determined from output files:
species
I will modify my original post to reflect the changes and add rule all. – Francisco Zorrilladynamic
? Strip down your testcase to onlyrule carveme:
. In thisrule
remove all fromshell
, except the twoecho {...}
. Try and add line by line. You are using relative filepath and docd ...
, this is contradict. Second you are definingoutput
but don't us it? – stovflcd ...
you change the root directory. This could lead to, that the realtive filepathcarvemeOut/
is not reachable. It's a guess, as I don' t know how{config[carveme_params][dir]}
expands to. – stovfl