Is there a possibility to avoid that the output files defined in a snakemake rule are deleted before executing the shell command? I found a description of this behaviour here: http://snakemake.readthedocs.io/en/stable/project_info/faq.html#can-the-output-of-a-rule-be-a-symlink
What I am trying to do is definining a rule for a list of input and a list of output files (N:M relation). This rule should be triggered if one of the input files has changed. The python script which is called in the shell command then creates only those output which do not exist or whose content has changed in comparison to the already existing files (i.e. a change detection is implemented inside the python script). I expected that something like the following rule should solve this, but as the output.jsons are deleted before running the python script, all output.jsons will be created with a new timestamp instead of only those which have changed.
rule jsons:
"Create transformation files out of landmark correspondences."
input:
matchfiles = ["matching/%04i-%04i.h5" % (SECTIONS[i], SECTIONS[i+1]) for i in range(len(SECTIONS)-1)]
output:
jsons = ["transformation/{section}_transformation.json".format(section=s) for s in SECTIONS]
shell:
"python create_transformation_jsons.py --matchfiles {input.matchfiles} --outfiles {output.jsons}"
If there is no possibility to avoid the deletion of output files in Snakemake, does anybody has another idea how to map this workflow into a snakemake rule without updating all output files?
Update:
I tried to solve this problem by changing the Snakemake source code. I removed the line self.remove_existing_output()
in jobs.py to avoid removing output files before executing a rule. Furthermore, I added the parameter no_touch=True
when self.dag.check_and_touch_output() is called in executors.handle_job_success. This worked great as the output files now were neither removed before nor touched after the rule is executed. But following rules with json files as input are still triggered for each json file (even if it did not change) as Snakemake recognizes that the json file was defined as an output before and theremore must have been changed.
So I think avoiding the deletion of output files does not solve my problem, maybe a workaround - if existing - is the only way...
Update 2:
I also tried to find a workaround without changing the Snakemake source code by changing the output path of the above defined jsons rule to transformation/tmp/...
and adding the following rule:
def cmp_jsons(wildcards):
section = int(wildcards.section)
# compare json for given section in transformation/ with json in transformation/tmp/
# return [] if json did not change
# return path to tmp json filename if json has changed
rule copy:
input:
json_tmp = cmp_jsons
output:
jsonfile = "transformation/B21_{section,\d+}_affine_transformation.json"
shell:
"cp {input.json_tmp} {output.jsonfile}"
But as the input function is evaluated before the workflow starts, the tmp-jsons are either not yet existing or not yet updated by the jsons rule and therefore the comparison won't be correct.