I have a folder where the outputs of the rule are generated. I am having a real trouble running snakemake
with it. If I do not specify the outputs in rule all
, the rule (called neo4j
) is not run at all. If I try running it manually with snakemake neo4j
(which I would prefer not to), then I get an error:
WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
I tried specifying the outputs of the rule in different ways but none of them worked.
Using
expand
:expand('results/neo4j/{sample}/cells.csv', sample=samples), expand('results/neo4j/{sample}/genes.csv', sample=samples), expand('results/neo4j/{sample}/cl_nodes.csv', sample=samples), expand('results/neo4j/{sample}/cl_contains.csv', sample=samples), expand('results/neo4j/{sample}/cl_isin.csv', sample=samples), expand('results/neo4j/{sample}/expr_by.csv', sample=samples), expand('results/neo4j/{sample}/expr_ess.csv', sample=samples)
Generates a very weird error for a completely different unrelated rule (called umap
):
Missing input files for rule umap: data_files/normalized/minus_2/cl_nodes.csv.csv
The path generation is completely messed up even though the folders are not connected in any way except for the results
being the root folder of all of the outputs.
Using
dynamic
:dynamic('results/neo4j/{sample}/cells.csv', sample=samples), dynamic('results/neo4j/{sample}/genes.csv', sample=samples), dynamic('results/neo4j/{sample}/cl_nodes.csv', sample=samples), dynamic('results/neo4j/{sample}/cl_contains.csv', sample=samples), dynamic('results/neo4j/{sample}/cl_isin.csv', sample=samples), dynamic('results/neo4j/{sample}/expr_by.csv', sample=samples), dynamic('results/neo4j/{sample}/expr_ess.csv', sample=samples)
Gives an error:
dynamic() got an unexpected keyword argument 'sample'
Ok, I tried removing sample=samples
but no luck
Just
directory
:directory('results/neo4j/{sample}/', sample=samples)
Gives error:
directory() got an unexpected keyword argument 'sample'
If I omit sample=samples
, not working either. If I specify directory
under rule all
output
, not working.
The rule I am having difficulty with is below:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv'
output:
base_neo4j = 'results/neo4j/{sample}'
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -base_neo4j {output.base_neo4j}"
snakemake
version is 5.2.2
Any suggestions would be greatly appreciated.
Update
I modified the Snakemake
file using suggestions of Mali Akmanalp
and now rule all
looks like that:
samples,=glob_wildcards('data_files/normalized/{sample}.csv')
rule all:
input:
expand('results/pca/img/{sample}_pca.png', sample=samples),
expand('results/pca/{sample}_pca.csv', sample=samples),
expand('results/tsne/{sample}_tsne.csv', sample=samples),
expand('results/umap/{sample}_umap.csv', sample=samples),
expand('results/umap/img/{sample}_umap.png', sample=samples),
expand('results/tsne/img/{sample}_tsne.png', sample=samples),
expand('results/clusters/umap/{sample}_umap_clusters.csv', sample=samples),
expand('results/clusters/tsne/{sample}_tsne_clusters.csv', sample=samples),
expand('results/neo4j/{sample}/{file}', sample=samples,
file=['cells.csv', 'genes.csv', 'cl_contains.csv', 'cl_isin.csv', 'cl_nodes.csv', 'expr_by.csv', 'expr_ess.csv'])
and neo4j
rule like that:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv',
base_neo4j = 'results/neo4j/{sample}'
output: 'results/neo4j/{sample}/cells.csv', 'results/neo4j/{sample}/genes.csv', 'results/neo4j/{sample}/cl_nodes.csv',
'results/neo4j/{sample}/cl_contains.csv', 'results/neo4j/{sample}/expr_by.csv', 'results/neo4j/{sample}/expr_ess.csv',
'results/neo4j/{sample}/cl_isin.csv'
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -base_neo4j {input.base_neo4j}"
With such set ups I am getting the error:
Missing input files for rule neo4j: results/neo4j/plus_1
Update
I removed this line from neo4j
rule: base_neo4j = 'results/neo4j/{sample}'
and then changed the output
of the rule to:
output:
cells = 'results/neo4j/{sample}/cells.csv',
genes = 'results/neo4j/{sample}/genes.csv',
cl_nodes = 'results/neo4j/{sample}/cl_nodes.csv',
cl_contains = 'results/neo4j/{sample}/cl_contains.csv',
cl_isin = 'results/neo4j/{sample}/cl_isin.csv',
expr_by = 'results/neo4j/{sample}/expr_by.csv',
expr_ess = 'results/neo4j/{sample}/expr_ess.csv'
and the shell
command:
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -cells {output.cells} -genes {output.genes} -cl_nodes {output.cl_nodes} -cl_contains {output.cl_contains} -cl_isin {output.cl_isin} -expr_by {output.expr_by} -expr_ess {output.expr_ess}"
I do not like feeding in each parameter in the output
but it is not working otherwise. I tried feeding in just output
but it only feeds in the first item in the output
, others are ignored for some reason. I asked a separate question regarding that:
Snakemake passes only the first path in the output to shell command
Other than that, it is working now.
rule all
? - Manavalan GajapathySnakemake
file that would work. So, what should be changed in your view to make it better? - Nikita Vlasenko