I'm finding that the name of the output file per rule seems to need a static portion, e.g. "data/{wildcard}_data.csv
" vs. "{wildcard}_data.csv
"
For example, the script below returns the following error on dryrun:
Building DAG of jobs... MissingInputException in line 12 of /home/rebecca/workflows/exploring_tools/affymetrix_preprocess/snakemake/Snakefile: Missing input files for rule getDatFiles: GSE4290
Script:
rule all:
input: expand("{geoid}_datout.scaled.expr.csv", geoid = config['geoid'], out_dir = config['out_dir'])
benchmark: "benchmark.csv"
rule getDatFiles:
input: "{geoid}"
output: temp("{geoid}_datFiles.RData")
shell:
"Rscript scripts/getDatFiles.R"
rule maskProbes:
input: "{geoid}_datFiles.RData"
output: temp("{geoid}_datFiles.masked.RData")
params:
probeFilterFxn = lambda x: config['probeFilterFxn'],
minProbeNumber = lambda x: config['minProbeNumber'],
probeSingle = lambda x: config['probeSingle']
script: "scripts/maskProbes.R"
rule runExpresso:
input: "{geoid}_datFiles.masked.RData"
output: temp("{geoid}_datout.RData")
params:
bgcorrect_method = lambda x: config['bgcorrect_method'],
normalize = lambda x: config['normalize'],
pmcorrect_method = lambda x: config['pmcorrect_method'],
summary_method = lambda x: config['summary_method']
script: "scripts/runExpresso.R"
rule scaleData:
input: "{geoid}_datout.RData"
output: temp("{geoid}_datout.scaled.RData")
params: sc = lambda x: config['sc']
script: "scripts/scaleData.R"
rule getExpr:
input: "{geoid}_datout.scaled.RData"
output: temp("{geoid}_datout.scaled.expr.csv")
script: "scripts/getExpr.R"
... While the following script runs without error (the difference being including "output/" ahead of the output file names:
rule all:
input: expand("output/{geoid}_datout.scaled.expr.csv", geoid = config['geoid'], out_dir = config['out_dir'])
benchmark: "output/benchmark.csv"
rule getDatFiles:
input: "output/{geoid}"
output: temp("output/{geoid}_datFiles.RData")
shell:
"Rscript scripts/getDatFiles.R"
rule maskProbes:
input: "output/{geoid}_datFiles.RData"
output: temp("output/{geoid}_datFiles.masked.RData")
params:
probeFilterFxn = lambda x: config['probeFilterFxn'],
minProbeNumber = lambda x: config['minProbeNumber'],
probeSingle = lambda x: config['probeSingle']
script: "scripts/maskProbes.R"
rule runExpresso:
input: "output/{geoid}_datFiles.masked.RData"
output: temp("output/{geoid}_datout.RData")
params:
bgcorrect_method = lambda x: config['bgcorrect_method'],
normalize = lambda x: config['normalize'],
pmcorrect_method = lambda x: config['pmcorrect_method'],
summary_method = lambda x: config['summary_method']
script: "scripts/runExpresso.R"
rule scaleData:
input: "output/{geoid}_datout.RData"
output: temp("output/{geoid}_datout.scaled.RData")
params: sc = lambda x: config['sc']
script: "scripts/scaleData.R"
rule getExpr:
input: "output/{geoid}_datout.scaled.RData"
output: temp("output/{geoid}_datout.scaled.expr.csv")
script: "scripts/getExpr.R"
I'm having a hard time understanding why this might be happening. Ultimately, I'd like to workflows that are as possible, and ideally, that entails making the output directory variable.
Any insight would be much appreciated.