Here is a commented example that could help you solve your problem:
# Create some way of associating output files with links
# The output file names will be built from the keys: "chain_{key}.gz"
# One could probably directly use output file names as keys
links = {
"1" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
"2" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
"3" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}
rule download:
output:
# We inform snakemake that this rule will generate
# the following list of files:
# ["outdir/chain_1.gz", "outdir/chain_2.gz", "outdir/chain_3.gz"]
# Note that we don't need to use {output} in the "run" or "shell" part.
# This list will be used if we later add rules
# that use the files generated by the present rule.
expand("outdir/chain_{n}.gz", n=links.keys())
run:
# The sort is there to ensure the files are in the 1, 2, 3 order.
# We could use an OrderedDict if we wanted an arbitrary order.
for link_num in sorted(links.keys()):
shell("wget {link} -O outdir/chain_{n}.gz".format(link=links[link_num], n=link_num))
And here is another way of doing, that uses arbitrary names for the downloaded files and uses output
(although a bit artificially):
links = [
("foo_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz"),
("bar_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz"),
("baz_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz")]
rule download:
output:
# We inform snakemake that this rule will generate
# the following list of files:
# ["outdir/foo_chain.gz", "outdir/bar_chain.gz", "outdir/baz_chain.gz"]
["outdir/{f}".format(f=filename) for (filename, _) in links]
run:
for i in range(len(links)):
# output is a list, so we can access its items by index
shell("wget {link} -O {chain_file}".format(
link=links[i][1], chain_file=output[i]))
# using a direct loop over the pairs (filename, link)
# could be considered "cleaner"
# for (filename, link) in links:
# shell("wget {link} -0 outdir/{filename}".format(
# link=link, filename=filename))
An example where the three downloads can be done in parallel using snakemake -j 3
:
# To use os.path.join,
# which is more robust than manually writing the separator.
import os
# Association between output files and source links
links = {
"foo_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
"bar_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
"baz_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}
# Make this association accessible via a function of wildcards
def chainfile2link(wildcards):
return links[wildcards.chainfile]
# First rule will drive the rest of the workflow
rule all:
input:
# expand generates the list of the final files we want
expand(os.path.join("outdir", "{chainfile}"), chainfile=links.keys())
rule download:
output:
# We inform snakemake what this rule will generate
os.path.join("outdir", "{chainfile}")
params:
# using a function of wildcards in params
link = chainfile2link,
shell:
"""
wget {params.link} -O {output}
"""
-j
option, only one rule instance will be run at a given time. Do the files need to be downloaded in a precise order ? – bliall
rule having only input, for which you can use expand. This will drive the rest of the workflow. – blioutput
corresponds to a single file name, but yourcallString
will consist in several calls towget
with always the same-O
argument. Also, the{downloaded_file}
part will make Snakemake have a wildcard named "downloaded_file", and will be unable to determine its value without further information. You should probably try to simplify your rule for a start. What would you do if you had only one link? – bliparams.shellCallFile
probably should be alog
and not aparams
. See for instance stackoverflow.com/a/42839257/1878788 – bli