Viral Ngs Versions Save

Viral genomics analysis pipelines

v1.25.0

4 years ago

Upgrades:

  • Picard 2.20.5 -> 2.21.1
  • Cromwell 33.1 -> 47
  • dxWDL 0.72 -> v1.33
    • dx-docker replaced with native OS docker
    • dx download replaced with dxda
  • dx-toolkit v0.285.0 -> v0.288.0
  • all dx_instance_types in WDL runtimes to v2 DNAnexus instance types

Bugfixes:

  • krona failed when output file was on a different filesystem from TMPDIR
  • krona attempted to repeatedly reinstall itself due to version documentation mismatch

v1.24.0

4 years ago

New:

  • binned coverage plot option in reports.py align_and_plot via --binLargePlots (#957) (thanks to @lakras)
  • metagenomics.py taxlevel_summary can now also aggregate reports in KrakenUniq format (#948)
  • krakenuniq WDL task (and demux_plus, etc.) writes a viral summary tsv, aggregating kraken reports from all input samples (#948)
  • add aggregate_spike_count function to reports.py to write a tsv table of spike-ins seen in all samples, aggregating separate spike-in reports (#955, #973, #981)
  • an optional QC check has been added to taxon_filter.py filter_lastal_bam (#961)
    • raises a QCError() if the sample name (bam file basename) begins with any number of negative control prefixes ("neg", "water", "NTC") and lastal has identified reads to keep after filtering
  • add WDL task to aggregate spikein reports (#965)
  • add tool wrapper for BBMap (#969)

Changed:

  • for more stringent demux, change max_mismatches=0, from max_mismatches=1 (#960)
  • malformed kraken reports now result in warnings (#958)
  • default reference used in DNAnexus WDL pipeline for ERCC seqs updated from from 32-seq file to 96-seq file (#959, #972)
  • metagenomics.py taxlevel_summary call moved to separate WDL task and called after kraken where it is called; this beak-out allows aggregation of kraken reports created previously or elsewhere (#964)
  • fasta ID is now sanitized for picard CreateSequenceDictionary calls to adhere to character set restrictions in SAM/BAM RNAME spec. see: samtools/hts-specs#333 (#977)
  • Adding spec for timeouts when running WDL on DNAnexus (#983; thanks @godotgildor)
  • install conda packages to separate environment within Docker container (#980)

Fixed:

  • quieted conda warnings related to use of -V (#947)
  • genome feature table parser for reading tsv/Sequin formatted-files now handles feature qualifiers that consist of only a key, fixing observed issue with ribosomal_slippage occurring without a value; qualifier-parsing regex also more robust (#949)
  • KrakenUniq Krona report now correctly reports "unique kmers" rather than "genome coverage" (#950)
  • in WDL KrakenUniq task, declared vars with defaults are now non-optional (#951)
  • Fixed WDL assemble task assembler parameterization for the joint trinity-spades case (#952)
  • The SampleSheet and tabfile readers are now tolerant of a BOM being present—seen in output written by some editors (#954)
  • In WDL for assembly scaffolding, contig alignment threshold changed from Int to Float (#975)
  • specify USE_JDK_DEFLATER=true and USE_JDK_INFLATER=true for picard until bug in Intel deflator is fixed to prevent sporadic crashes (#977)

Added/Upgraded:

  • matplotlib 1.5.3 -> 2.2.4 (#948, #977)
  • bedtools 2.27.1 -> 2.28.0 (#977)
  • blast 2.6.0 -> 2.7.1 (#977)
  • lxml 4.3.0 -> 4.3.3 (#948)
  • switch from picard to picard-slim package (sans 'r') (#977)
  • update picard 2.18.11 -> 2.20.5 (#977)
  • krona 2.7 -> 2.7.1
  • bump Docker viral-baseimage 0.1.14 -> 0.1.15 (#979)
  • added bbmap 38.56
  • change lz4 dependency from lz4-bin sourced from bioconda to lz4-c from conda-forge, 131 -> 1.9.1 (#977)

v1.23.0

5 years ago

New:

  • scaffolder uses ambiguous alignments when no unambiguous ones exist (#904)
  • add function to assist Illumina index correction (#917)
    • illumina.py::guess_barcodes identifies barcodes with outlier counts after demux, and suggests possible corrections
  • Testing migrated from travis-ci.ORG -> travis-ci.COM (#910)
  • the Snakemake and WDL pipelines now create a file of the top spikeins seen (#909)
  • functionality reporting outlier_barcodes can now act on single-index runs (#932)
  • the viral-ngs version is out emitted as a string output of WDL workflows (#928)
  • The params --skipMarkDupes and --plotOnlyNonDuplicates are now exposed in the WDL task plot_coverage in tasks_reports.wdl (#925)

Changed:

  • testing-related performance improvements (#915)
  • changes related to the use of conda v4 (#922, #923)
  • new stub CondaPackage for use with tests (#916)
  • conda fixes related to changed bioconda guidelines (#906)
    • primarily with respect to package channel priorities

Fixed:

  • changes related to conda on travis (#944)
  • Corrections to Broad index sequences listed in illumina_indices.py (#941)
  • update tasks_taxon_filter.wdl to respect tags_to_clear_space_separated param (#919)
  • illumina.py fixed small bug where exception was not raised for missing files (#926)
  • memory spec corrections in UGER job submission script; JVM memory update for demux (#920)
  • fix util.misc.available_cpu_count() (#912)
  • add guardrails to barcode_helper for the case where observed barcodes are null (#946)

Added/Upgraded:

  • update PyYAML to v5.1 to address CVE-2017-18342
  • perl 5.22.0 -> 5.26 to support conda build 3 and current conda-forge pinnings (#922, #945 )
  • java-jdk==8.0.112 -> openjdk==8.0.112
  • downgrade of blast 2.7.1 -> 2.6.0 due to upstream boost incompatibilities Within docker image (#906):
  • pysam 3.11 -> 3.12
  • biopython 1.68 -> 1.72
  • samtools 1.6 -> 1.9
  • pigz 2.3.4 -> 2.4
  • picard 2.9.0 -> 2.18.11

v1.22.1

5 years ago

Fixed:

  • Fix issues related to Trinity on certain environments (#900)
  • allow for flexibility in gatk v3 wrapper supplied by bioconda (#902)
  • When determining available memory and cores, cgroup limits are now taken into account (#905)
  • Prevent dx jobs launched from travis-ci from running ad infinitum. (#901)

Added/Upgraded:

  • viral-baseimage 0.1.13 -> 0.1.14 (#903)

v1.22.0

5 years ago

New:

  • Adding commands for working with kmer sets using the KMC tool. (#854)
    • new top-level python file: kmer_utils.py providing the following functions (see the documentation for more information):
      • build_kmer_db: Build a database of kmers occurring in given sequences
      • dump_kmer_counts : Dump kmers and their counts from kmer database to a text file
      • filter_reads : Filter reads based on their kmer content
      • kmers_binary_op: Perform a simple binary operation on kmer sets
      • kmers_set_counts : Copy the kmer database, setting all kmer counts in the output to the given value
  • add metagenomics.py::filter_bam_to_taxa (#883)
    • This function filters an input bam file to include only reads that have been mapped to specified taxonomic IDs or scientific names. This requires a classification TSV file, as produced by tools such as Kraken, as well as the NCBI taxonomy database. The column numbers of the tax ID and read ID can be specified, allowing use beyond kraken-format read classification files, however the relationship is assumed to be bijective.
  • add WDL for filter_bam_to_taxa
  • assembly.py::assemble_spades now has an option, --minContigLen, to so spades-based de novo assembly now yields only contigs longer than a specified length (#889)
  • assembly.py - added --alwaysSucceed option to SPAdes (#888)
  • allow RunInfo.xml override in illumina_demux WDL task (#891)
  • Added read_utils.py::read_names to extract read names from a sequence file
  • Added run-pipe_local.sh wrapper script for invoking the Snakemake-based pipeline on a single compute instance (#897)

Changed:

  • the Unmatched.bam file is now preserved in the illumina_demux WDL task (#887)
  • increase memory headroom requested for UGER jobs by 10% (#892)
  • (Broad only) change dotkit providing python-yaml (#890)
  • use python3 in easy-deploy script if available (#894)
  • Snakemake rules now specify their memory requirement via the mem_mb param, which is recognized by certain execution engines such as kubernetes (#897)

Fixed:

  • do not require chromosome names when checking whether a bam file is sorted (#898)
  • add --no-same-owner to tar -x in WDL tasks (#880)
  • safely build snpEff database (#881)
  • allow ints in Snakemake remote protocols ("s3://"...) (#895)
  • fix ncbi tbl parser for refseq accessions (#899)

Added/Upgraded:

  • coveralls 1.1 -> 1.3.0(#876)
  • pytest 3.6.3 -> 3.7.1 (#876)
  • pytest-mock 1.5.0 -> 1.10.0 (#876)
  • pytest-xdist 1.15.0 -> 1.22.5 (#876)
  • coverage 4.4.1 -> 4.5.1 (#876)
  • spades 3.11.1 -> 3.12.0 (#878)
  • Added kmc 3.1.1rc1
  • update Docker viral-baseimage 0.1.12 - 0.1.13 (#884)

v1.21.2

5 years ago

Fixed:

  • Bugfix to WDL demux workflow

v1.21.1

5 years ago

New: Changed:

  • in WDL workflows, default demultiplexing parameters to support novaseq [#868]
  • in illumina.py::illumina_demux:
    • max_mismatches changed 0 -> 1, and minimum_base_quality: 25 -> 10

Upgraded:

  • samtools (and htslib) 1.7 -> 1.9 [#870]
  • pysam 0.14.1 -> 0.15.0 [#870]
  • picard 2.18.9 -> 2.18.11 [#869]

v1.21.0

5 years ago

New:

  • added a new utility function to merge a group of separate tarballs into one single tarball: file_utils.py::merge_tarballs() [#853]
    • useful for consolidating sequencing runs that have been uploaded in chunks
    • data can be piped in and/or out
    • tarball content can be extracted to disk during the repack
  • added WDL workflow, isnvs_merge_to_vcf.wdl, to perform multiple alignment on assembled sequences, call iSNVs, and produce a VCF with variants seen in across all samples relative to a reference [#864]

Changed:

  • WDL task files renamed to have tasks_ prefix [#865]
  • dxWDL 0.72: revert WDLs to unbound task inputs [#857, #860]

Fixed:

  • WDL dx-launcher: consolidate_run_tarballs is now executed as a separate top-level job to allow uploading of the output [#858]
  • Broad UGER: allow run-pipe.sh to be called when a conda env is active [#851]
  • remove build_lastal_db as a Snakemake local rule [#849]
  • testing: fixed an issue with the handling of tempdirs [#866]

Added/upgraded:

  • mafft 7.221 -> 7.402 [#863]
  • gatk 3.6 -> 3.8 [#863, #867]
  • bwa 0.7.15 -> 0.7.17 [#863]
  • blast 2.6.0 -> 2.7.1 [#863]
  • trimmomatic 0.36 -> 0.38 [#863]
  • bedtools 2.26.0 -> 2.27.1 [#863]
  • biopython 1.70 -> 1.72 [#863]
  • snakemake 4.1.0 -> 5.2.0 [#852]
  • pytest 3.0.5 -> 3.6.3 [#856]
  • update Dockerfile viral-baseimage 0.1.11 -> 0.1.12 [#861]
  • dxWDL 0.69 -> 0.72 (soon to be merged in to dx-toolkit) [#857]

v1.20.1

5 years ago

Changed:

  • improve handling of possible feature table IDs [#846]

Fixed:

  • genome feature annotation transfer: corrected fuzzy position operator for clipped features at 3' end of positive strand [#847]

Added/upgraded:

  • picard 2.17.6 -> 2.18.9 (to avoid OOM errors when demultiplexing NovaSeq runs)

v1.20.0

5 years ago

New:

  • WDL workflow added to read_utils.wdl::downsample() [#819, #821, #823]
  • WDL workflows added for iSNV calling/v-phaser [#828]
    • assemble_denovo_with_deplete_and_isnv_calling.wdl
    • assemble_denovo_with_isnv_calling.wdl
    • isnvs_one_sample.wdl
  • WDL workflow to create a DNAnexus applet to launch demultiplexing [#838]
    • demultiplexing on DNAnexus occurs on instances scaled to input run type/size, up to NovaSeq size
    • sequencing_center can be passed as an input to the DNAnexus applet, allowing it to be set for demultiplexing at time of upload [#844]

Changed:

  • optimizations related to bwa-based depletion (output is piped between several steps to avoid disk writes) [#791]
  • ncbi.py:: tbl_transfer() now uses a rewritten feature table parser that is more tolerant of possible edge cases present in feature tables [#826]
  • sampleNamesFile no longer output by interhost.wdl:: multi_align_mafft_ref() [#808]
  • -P removed from snakemake UGER qsub command [#817]
  • intrahost.py::merge_to_vcf now tries to guess sample names to use in creating VCF file, based on v-phaser output [#828]
  • WDL workflows altered for compatibility with dxWDL 0.69
  • illumina.py::illumina_demux updated to allow NovaSeq-format dates [#831]
  • demux.wdl::illumina_demux now allows a custom RunInfo.xml to be passed in [#831]
  • demux.wdl:illumina_demux now allows thread count to be passed to demux [#834]
  • various small documentation updates [#835]
  • gzip replaced with pigz in several external process calls to improve performance [#842]
  • in depletion, post-bwa filter changed from -f0x4 (include unmapped) to -F0x2 (exclude mapped proper pairs) [#791]

Fixed:

  • bwa-depleted bam files are now reverted to remove headers related to human alignment that could cause issues for downstream tools [#791]
  • snakemake config now correctly lists only hg19 in bwa_dbs_remove
  • bugfixes to WDL workflow for demux.wdl:: merge_and_reheader_bams() [#818, #824]
  • fixed bug that caused incorrect encoding of ALT alleles in sample-specific GT columns in intrahost.py::merge_to_vcf [#828]
  • fixed a non-deterministic/intermittent error in kraken call related to pipe closure [#840, #841 ]

Added/upgraded:

  • dxWDL 0.60.2 -> 0.69 [#828, #838]
  • cromwell 30.2 -> 32 [#828, #838] (WDL workflows are still in draft-2 spec)
  • fastqc 0.11.5 -> 0.11.7 [#833] (supports NovaSeq)
  • pigz 2.3.4 -> 2.4 [#842]
  • For viral-ngs Docker image, bumped viral-baseimage 0.1.9 -> 0.1.11 [#811]