0
votes

I am new in using snakemake, I have an issue when doing the step gatk VariantRecalibrator on snakemake, it generated error but the script can run without error when not in snakemake format.

import snakemake.io
import os

REF="/data/data/reference/refs/ucsc.hg19.fasta"
HM="/data/data/variant_call/hapmap_3.3.hg19.sites.vcf"
OMNI="/data/data/variant_call/1000G_omni2.5.hg19.sites.vcf"
SNPS="/data/data/variant_call/1000G_phase1.snps.high_confidence.hg19.sites.vcf"
DBSNP="/data/data/variant_call/dbsnp_138.hg19.vcf"

NAME="CHS"

rule all:
  input:  "VCFs/{name}.recal.vcf".format(name=NAME),
          "VCFs/{name}.output.tranches".format(name=NAME)

rule vqsr:
  input:  vcf="VCFs/SRS008640.raw.vcf",
          ref=REF,
          hm=HM,
          omni=OMNI,
          snps=SNPS,
          dbsnp=DBSNP
  output: recal="VCFs/{name}.recal.vcf".format(name=NAME),
          tranches="VCFs/{name}.output.tranches".format(name=NAME),
          rscript="VCFs/{name}.output.plots.R".format(name=NAME)
  params: java_opts="-Xmx16g"
  shell: "gatk --java-options -Xmx16g VariantRecalibrator \
  -R {input.ref} \
  -V {input.vcf} \
  --resource:hapmap,known=false,training=true,truth=true,prior=15.0 {input.hm} \
  --resource:omni,known=false,training=true,truth=false,prior=12.0 {input.omni} \
  --resource:1000G,known=false,training=true,truth=false,prior=10.0 {input.snps} \
  --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 {input.dbsnp} \
  -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
  -mode SNP \
  -O {output.recal} \
  --tranches-file {output.tranches} \
  --rscript-file {output.rscript}"

Error: Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to p rint the stack trace. [Thu May 30 08:05:30 2019] Error in rule vqsr: jobid: 1 output: VCFs/CHS.recal.vcf, VCFs/CHS.output.tranches, VCFs/CHS.output.plots.R

RuleException:
CalledProcessError in line 28 of /data/data/Samples/snakemake-example/WGS-test/step7.smk:
    Command ' set -euo pipefail;  gatk --java-options -Xmx16g VariantRecalibrator   -R /data/data/reference/refs/ucsc.hg19.fas ta   -V VCFs/SRS008640.raw.vcf   --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /data/data/variant_call /hapmap_3.3.hg19.sites.vcf   --resource:omni,known=false,training=true,truth=false,prior=12.0 /data/data/variant_call/1000 G_omni2.5.hg19.sites.vcf   --resource:1000G,known=false,training=true,truth=false,prior=10.0 /data/data/variant_call/1000G _phase1.snps.high_confidence.hg19.sites.vcf   --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /data/data/ variant_call/dbsnp_138.hg19.vcf   -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR   -mode SNP   -O VCFs/CHS. recal.vcf   --tranches-file VCFs/CHS.output.tranches   --rscript-file VCFs/CHS.output.plots.R ' returned non-zero exit sta tus 2.
      File "/data/data/Samples/snakemake-example/WGS-test/step7.smk", line 28, in __rule_vqsr
      File "/root/miniconda3/envs/bioinfo/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    Removing output files of failed job vqsr since they might be corrupted:
    VCFs/CHS.recal.vcf, VCFs/CHS.output.tranches, VCFs/CHS.output.plots.R
    Shutting down, this might take some time.
    Exiting because a job execution failed. Look above for error message
    Complete log: /data/data/Samples/snakemake-example/WGS-test/.snakemake/log/2019-05-30T065011.676785.snakemake.log

If I use the same code, I can run to create the recal file and tranches and can go to the next step applyvqsr, however if I put it in the snakemake it has error and line 27 is gatk --java-options -Xmx16g VariantRecalibrator is error but I don't know what error is it. Please advice.

1

1 Answers

1
votes

I would advise you to run Snakemake with the --printshellcmds parameter. That would give you exact command that it runs in your shell (not the one that you get in the exception which comes from python guts). You may copy that command and run manually.

Let me clarify this: Snakemake runs the command not as you specify it in the shell section, but it runs a subprocess, adds additional parameters and sets the environment variables (both differ depending on the shell and OS), then it uses python facilities to run an application. Running Snakemake with the --printshellcmds parameter would allow you to see the command free of formatting issues (when all wildcards are resolved) but without the overhead of details like set -euo pipefail;.