I have a snakemake workflow where some rules have a complex function as input:
def source_fold_data(wildcards):
fold_type = wildcards.fold_type
if fold_type in {"log2FoldChange", "lfcMLE"}:
if hasattr(wildcards, "contrast_type"):
# OPJ is os.path.join
return expand(
OPJ(output_dir, aligner, "mapped_C_elegans",
"deseq2_%s" % size_selected, "{contrast}",
"{contrast}_{{small_type}}_counts_and_res.txt"),
contrast=contrasts_dict[wildcards.contrast_type])
else:
return rules.small_RNA_differential_expression.output.counts_and_res
elif fold_type == "mean_log2_RPKM_fold":
if hasattr(wildcards, "contrast_type"):
# This is the branch used when I have the AttributeError
#https://stackoverflow.com/a/26791923/1878788
return [filename.format(wildcards) for filename in expand(
OPJ(output_dir, aligner, "mapped_C_elegans",
"RPKM_folds_%s" % size_selected, "{contrast}",
"{contrast}_{{0.small_type}}_RPKM_folds.txt"),
contrast=contrasts_dict[wildcards.contrast_type])]
else:
return rules.compute_RPKM_folds.output.fold_results
else:
raise NotImplementedError("Unknown fold type: %s" % fold_type)
The above function is used as input for two rules:
rule make_gene_list_lfc_boxplots:
input:
data = source_fold_data,
output:
boxplots = OPJ(output_dir, "figures", "{contrast}",
"{contrast}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}")
params:
id_lists = set_id_lists,
run:
data = pd.read_table(input.data, index_col="gene")
lfcs = pd.DataFrame(
{list_name : data.loc[set(id_list)][wildcards.fold_type] for (
list_name, id_list) in params.id_lists.items()})
save_plot(output.boxplots, plot_boxplots, lfcs, wildcards.fold_type)
rule make_contrast_lfc_boxplots:
input:
data = source_fold_data,
output:
boxplots = OPJ(output_dir, "figures", "all_{contrast_type}",
"{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}")
params:
id_lists = set_id_lists,
run:
lfcs = pd.DataFrame(
{f"{contrast}_{list_name}" : pd.read_table(filename, index_col="gene").loc[
set(id_list)]["mean_log2_RPKM_fold"] for (
contrast, filename) in zip(contrasts_dict["ip"], input.data) for (
list_name, id_list) in params.id_lists.items()})
save_plot(output.boxplots, plot_boxplots, lfcs, wildcards.fold_type)
The second one fails with 'InputFiles' object has no attribute 'data'
, and only in some cases: I ran the same workflow with two different configuration files, and the error happened in only one of the two, although this rule was executed in both cases, and the same branch of the input function was taken.
How can this happen if the rule has:
input:
data = ...
?
I suppose this has to do with what my source_fold_data
returns, either the explicit output of another rule, either a "manually" constructed list of file names.
snakemake
. That never stopped me from giving helpful answers though if only they care to provide enough info to reconstruct what's going on. Guesswork proved to be a rather unproductive venue whether I know the product or not. – ivan_pozdeevcontrasts_dict
has en empty list as entry for the keywildcards.contrast_type
. – bli