Viral genomics analysis pipelines
Upgrades:
dx_instance_types
in WDL runtimes to v2
DNAnexus instance typesBugfixes:
New:
reports.py align_and_plot
via --binLargePlots
(#957) (thanks to @lakras)metagenomics.py taxlevel_summary
can now also aggregate reports in KrakenUniq format (#948)demux_plus
, etc.) writes a viral summary tsv, aggregating kraken reports from all input samples (#948)aggregate_spike_count
function to reports.py to write a tsv table of spike-ins seen in all samples, aggregating separate spike-in reports (#955, #973, #981)taxon_filter.py filter_lastal_bam
(#961)
QCError()
if the sample name (bam file basename) begins with any number of negative control prefixes ("neg
", "water
", "NTC
") and lastal has identified reads to keep after filteringChanged:
max_mismatches=0
, from max_mismatches=1
(#960)metagenomics.py taxlevel_summary
call moved to separate WDL task and called after kraken where it is called; this beak-out allows aggregation of kraken reports created previously or elsewhere (#964)CreateSequenceDictionar
y calls to adhere to character set restrictions in SAM/BAM RNAME spec. see: samtools/hts-specs#333 (#977)Fixed:
-V
(#947)ribosomal_slippage
occurring without a value; qualifier-parsing regex also more robust (#949)unique kmers
" rather than "genome coverage
" (#950)Int
to Float
(#975)Added/Upgraded:
1.5.3
-> 2.2.4
(#948, #977)2.27.1
-> 2.28.0
(#977)2.6.0
-> 2.7.1
(#977)4.3.0
-> 4.3.3
(#948)picard
to picard-slim
package (sans 'r') (#977)2.18.11
-> 2.20.5
(#977)2.7
-> 2.7.1
0.1.14
-> 0.1.15
(#979)38.56
131
-> 1.9.1
(#977)New:
illumina.py::guess_barcodes
identifies barcodes with outlier counts after demux, and suggests possible corrections--skipMarkDupes
and --plotOnlyNonDuplicates
are now exposed in the WDL task plot_coverage
in tasks_reports.wdl
(#925)Changed:
Fixed:
tasks_taxon_filter.wdl
to respect tags_to_clear_space_separated
param (#919)illumina.py
fixed small bug where exception was not raised for missing files (#926)util.misc.available_cpu_count()
(#912)Added/Upgraded:
5.22.0
-> 5.26
to support conda build 3 and current conda-forge pinnings (#922, #945 )java-jdk==8.0.112
-> openjdk==8.0.112
2.7.1
-> 2.6.0
due to upstream boost incompatibilities
Within docker image (#906):3.11
-> 3.12
1.68
-> 1.72
1.6
-> 1.9
2.3.4
-> 2.4
2.9.0
-> 2.18.11
Fixed:
Added/Upgraded:
0.1.13
-> 0.1.14
(#903)New:
kmer_utils.py
providing the following functions (see the documentation for more information):
build_kmer_db
: Build a database of kmers occurring in given sequencesdump_kmer_counts
: Dump kmers and their counts from kmer database to a text filefilter_reads
: Filter reads based on their kmer contentkmers_binary_op
: Perform a simple binary operation on kmer setskmers_set_counts
: Copy the kmer database, setting all kmer counts in the output to the given valuemetagenomics.py::filter_bam_to_taxa
(#883)
filter_bam_to_taxa
assembly.py::assemble_spades
now has an option, --minContigLen
, to so spades-based de novo assembly now yields only contigs longer than a specified length (#889)illumina_demux
WDL task (#891)read_utils.py::read_names
to extract read names from a sequence filerun-pipe_local.sh
wrapper script for invoking the Snakemake-based pipeline on a single compute instance (#897)Changed:
Unmatched.bam
file is now preserved in the illumina_demux
WDL task (#887)python-yaml
(#890)mem_mb
param, which is recognized by certain execution engines such as kubernetes (#897)Fixed:
--no-same-owner
to tar -x
in WDL tasks (#880)s3://
"...) (#895)Added/Upgraded:
1.1
-> 1.3.0
(#876)3.6.3
-> 3.7.1
(#876)1.5.0
-> 1.10.0
(#876)1.15.0
-> 1.22.5
(#876)4.4.1
-> 4.5.1
(#876)3.11.1
-> 3.12.0
(#878)3.1.1rc1
0.1.12
- 0.1.13
(#884)Fixed:
New: Changed:
illumina.py::illumina_demux
:
max_mismatches
changed 0 -> 1, and minimum_base_quality
: 25 -> 10
Upgraded:
samtools
(and htslib
) 1.7 -> 1.9 [#870]pysam
0.14.1 -> 0.15.0 [#870]picard
2.18.9 -> 2.18.11 [#869]New:
file_utils.py::merge_tarballs()
[#853]
isnvs_merge_to_vcf.wdl
, to perform multiple alignment on assembled sequences, call iSNVs, and produce a VCF with variants seen in across all samples relative to a reference [#864]Changed:
tasks_
prefix [#865]Fixed:
dx-launcher
: consolidate_run_tarballs
is now executed as a separate top-level job to allow uploading of the output [#858]build_lastal_db
as a Snakemake local rule [#849]Added/upgraded:
7.221
-> 7.402
[#863]3.6
-> 3.8
[#863, #867]0.7.15
-> 0.7.17
[#863]2.6.0
-> 2.7.1
[#863]0.36
-> 0.38
[#863]2.26.0
-> 2.27.1
[#863]1.70
-> 1.72
[#863]4.1.0
-> 5.2.0
[#852]3.0.5
-> 3.6.3
[#856]0.69
-> 0.72
(soon to be merged in to dx-toolkit
) [#857]Changed:
Fixed:
Added/upgraded:
2.17.6
-> 2.18.9
(to avoid OOM errors when demultiplexing NovaSeq runs)New:
read_utils.wdl::downsample()
[#819, #821, #823]assemble_denovo_with_deplete_and_isnv_calling.wdl
assemble_denovo_with_isnv_calling.wdl
isnvs_one_sample.wdl
Changed:
ncbi.py:: tbl_transfer()
now uses a rewritten feature table parser that is more tolerant of possible edge cases present in feature tables [#826]-P
removed from snakemake UGER qsub command [#817]intrahost.py::merge_to_vcf
now tries to guess sample names to use in creating VCF file, based on v-phaser output [#828]illumina.py::illumina_demux
updated to allow NovaSeq-format dates [#831]demux.wdl::illumina_demux
now allows a custom RunInfo.xml
to be passed in [#831]demux.wdl:illumina_demux
now allows thread count to be passed to demux [#834]-f0x4
(include unmapped) to -F0x2
(exclude mapped proper pairs) [#791]Fixed:
bwa_dbs_remove
intrahost.py::merge_to_vcf
[#828]Added/upgraded:
0.60.2
-> 0.69
[#828, #838]30.2
-> 32
[#828, #838] (WDL workflows are still in draft-2 spec)0.11.5
-> 0.11.7
[#833] (supports NovaSeq)2.3.4
-> 2.4
[#842]0.1.9
-> 0.1.11
[#811]