0
votes

I want to use gatk recalibration using pair sample ( tumor and normal). I need to parse the data using pandas. That is what I wroted.

expand("mapped_reads/merged_samples/{sample[1][tumor]}/{sample[1][tumor]}_{sample[1][normal]}.bam", sample=read_table(config["conditions"], ",").iterrows())

this is the condition file:

432,433
434,435

I wrote this rule:

rule gatk_RealignerTargetCreator:
    input:
          "mapped_reads/merged_samples/{tumor}.sorted.dup.reca.bam",
          "mapped_reads/merged_samples/{normal}.sorted.dup.reca.bam",

    output:
        "mapped_reads/merged_samples/{tumor}/{tumor}_{normal}.realign.intervals"
    params:
        genome=config['reference']['genome_fasta'],
        mills= config['mills'],
        ph1_indels= config['know_phy'],
    log:
        "mapped_reads/merged_samples/logs/{tumor}_{normal}.realign_info.log"
    threads: 8
    shell:
        "gatk -T RealignerTargetCreator -R {params.genome} {params.custom} "
        "-nt {threads} "
        "-I {wildcard.tumor} -I {wildcard.normal}  -known {params.ph1_indels} "
        "-o {output} >& {log}"

I have this error:

InputFunctionException in line 17 of /home/maurizio/Desktop/TEST_exome/rules/samfiles.rules:
KeyError: '432/432_433'
Wildcards:
sample=432/432_433

this is the samfiles.rules:

rule samtools_merge_bam:
    """
    Merge bam files for multiple units into one for the given sample.
    If the sample has only one unit, files will be copied.
    """
    input:
        lambda wildcards: expand("mapped_reads/bam/{unit}_sorted.bam",unit=config["samples"][wildcards.sample])
    output:
        "mapped_reads/merged_samples/{sample}.bam"
    benchmark:
        "benchmarks/samtools/merge/{sample}.txt"
    run:
        if len(input) > 1:
            shell("/illumina/software/PROG2/samtools-1.3.1/samtools merge {output} {input}")
        else:
            shell("cp {input} {output} && touch -h {output}")
1
So, obviously "432/432_433" is not among the samples in your config file. This is what the error message tells us (the config variable is a python dict, it throws a KeyError). Note that you have two wildcards in the first rule, and only one in the samtools_merge_bam rule. Hence, this one wildcard tries to match the whole {tumor}/{tumor}_{normal} part of the file path.Johannes Köster
@JohannesKöster thanks for your help!! How can manage this problem? What is the way to resolve this. Could you please make me and example?mau_who

1 Answers

2
votes

I can only guess because you don't show all relevant rule, but I would say the error occurs because the rule samtools_merge_bam also applies to some later bam file where you have the pattern {tumor}/{tumor}_{normal}...

As a solution, you have to resolve this ambiguity (see the snakemake tutorial). For example, you can constrain the wildcard of samtools_merge_bam to not contain any slashes.

wildcard_constraints:
    sample="[^/]+"

You can put the constraint either globally or inside your samtools_merge_bam rule.