1
votes

I have a snakemake workflow where some rules have a complex function as input:

def source_fold_data(wildcards):
    fold_type = wildcards.fold_type
    if fold_type in {"log2FoldChange", "lfcMLE"}:
        if hasattr(wildcards, "contrast_type"):
            # OPJ is os.path.join
            return expand(
                OPJ(output_dir, aligner, "mapped_C_elegans",
                    "deseq2_%s" % size_selected, "{contrast}",
                    "{contrast}_{{small_type}}_counts_and_res.txt"),
                contrast=contrasts_dict[wildcards.contrast_type])
        else:
            return rules.small_RNA_differential_expression.output.counts_and_res
    elif fold_type == "mean_log2_RPKM_fold":
        if hasattr(wildcards, "contrast_type"):
            # This is the branch used when I have the AttributeError
            #https://stackoverflow.com/a/26791923/1878788
            return [filename.format(wildcards) for filename in expand(
                OPJ(output_dir, aligner, "mapped_C_elegans",
                    "RPKM_folds_%s" % size_selected, "{contrast}",
                    "{contrast}_{{0.small_type}}_RPKM_folds.txt"),
                contrast=contrasts_dict[wildcards.contrast_type])]
        else:
            return rules.compute_RPKM_folds.output.fold_results
    else:
        raise NotImplementedError("Unknown fold type: %s" % fold_type)

The above function is used as input for two rules:

rule make_gene_list_lfc_boxplots:
    input:
        data = source_fold_data,
    output:
        boxplots = OPJ(output_dir, "figures", "{contrast}",
            "{contrast}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}")
    params:
        id_lists = set_id_lists,
    run:
        data = pd.read_table(input.data, index_col="gene")
        lfcs = pd.DataFrame(
            {list_name : data.loc[set(id_list)][wildcards.fold_type] for (
                list_name, id_list) in params.id_lists.items()})
        save_plot(output.boxplots, plot_boxplots, lfcs, wildcards.fold_type)


rule make_contrast_lfc_boxplots:
    input:
        data = source_fold_data,
    output:
        boxplots = OPJ(output_dir, "figures", "all_{contrast_type}",
            "{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}")
    params:
        id_lists = set_id_lists,
    run:
        lfcs = pd.DataFrame(
            {f"{contrast}_{list_name}" : pd.read_table(filename, index_col="gene").loc[
                set(id_list)]["mean_log2_RPKM_fold"] for (
                    contrast, filename) in zip(contrasts_dict["ip"], input.data) for (
                        list_name, id_list) in params.id_lists.items()})
        save_plot(output.boxplots, plot_boxplots, lfcs, wildcards.fold_type)

The second one fails with 'InputFiles' object has no attribute 'data', and only in some cases: I ran the same workflow with two different configuration files, and the error happened in only one of the two, although this rule was executed in both cases, and the same branch of the input function was taken.

How can this happen if the rule has:

    input:
        data = ...

?

I suppose this has to do with what my source_fold_data returns, either the explicit output of another rule, either a "manually" constructed list of file names.

1
Upgrade the code to a minimal reproducible example. Currently, there's insufficient info for reproduction and thus diagnostics.ivan_pozdeev
@ivan_pozdeev I agree 100% that a minimal reproducible example would help diagnosing the issue. However, this is a quite intricate workflow and it will take a lot of efforts and time to acheive this. My hope was that, meanwhile, this type of issue might ring a bell to someone more familiar than me with the internals of snakemake.bli
Well, that's not me, it's the first time I hear of snakemake. That never stopped me from giving helpful answers though if only they care to provide enough info to reconstruct what's going on. Guesswork proved to be a rather unproductive venue whether I know the product or not.ivan_pozdeev
My guess would be that in some cases the function returns an empty list. I would construct the list in a variable before the return statement, then print it, then return it to see if you can narrow down what's happening.Colin
@Colin That was the correct guess, thanks. In the faulty case, my contrasts_dict has en empty list as entry for the key wildcards.contrast_type.bli

1 Answers

2
votes

As @Colin suggested in the comments, the problem happens when the input function returns an empty list. This is the case here when contrasts_dict[wildcards.contrast_type] is an empty list, a condition indicating that there is actually no point in trying to generate the output of the rule make_contrast_lfc_boxplots. I avoided the situation by modifying the input section of the rule all as follows:

Old version:

rule all:
    input:
        # [...]
        expand(OPJ(output_dir, "figures", "all_{contrast_type}", "{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}"), contrast_type=["ip"], small_type=IP_TYPES, fold_type=["mean_log2_RPKM_fold"], gene_list=BOXPLOT_GENE_LISTS, fig_format=FIG_FORMATS),
        # [...]

New version:

if contrasts_dict["ip"]:
    ip_fold_boxplots = expand(OPJ(output_dir, "figures", "all_{contrast_type}", "{contrast_type}_{small_type}_{fold_type}_{gene_list}_boxplots.{fig_format}"), contrast_type=["ip"], small_type=IP_TYPES, fold_type=["mean_log2_RPKM_fold"], gene_list=BOXPLOT_GENE_LISTS, fig_format=FIG_FORMATS)
else:
    ip_fold_boxplots = []
rule all:
    input:
        # [...]
        ip_fold_boxplots,
        # [...]

Some tinkering with snakemake/rules.py show that, at some point, the data attribute exist for the input attribute of the Rule object named make_contrast_lfc_boxplots, and that this attribute is still the source_fold_data function. I suppose this is later evaluated and removed when it is an empty list, but I haven't been able to find where.

I suppose the empty input is not a problem when snakemake constructs the dependency graph between rules. The problem therefore only occurs during the execution of a rule.