DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
In this release:
call_variants
that caused the step to freeze in cases where there were no examples. This bug was observed and reported in https://github.com/google/deepvariant/issues/764, https://github.com/google/deepvariant/issues/769, https://github.com/google/deepsomatic/issues/8.libssw
library from 1.2.4 to 1.2.5.postprocess_variants
which reduces 48 minutes to 30 minutes for Illumina WGS and 56 minutes to 33 minutes with PacBio.We are sincerely grateful to
postprocess_variants
.--model_type ONT_R104
is a new option. Starting from v1.5, DeepVariant natively supports ONT R10.4 simplex and duplex data.
--enable_joint_realignment
and --p_error
.insert_size
) . This reduces errors by 4-10% for Illumina WGS and WES model. Thanks @lucasbrambrink for implementing this feature.postprocess_variants
step by 10-30%. Thanks @moshewagner for optimizing the code.call_variants
speed for PacBio models (both DeepVariant and DeepTrio) by reducing the default window width from 221 to 199, without tradeoff on accuracy. Thanks to @lucasbrambrink for conducting the experiments to find a better window width for PacBio.--normalize_reads
in make_examples
, which normalizes Indel candidates at the reads level.This flag is useful to reduce rare cases where an indel variant is not left-normalized. This feature is mainly relevant to joint calling of large cohorts for joint calling, or cases where read mappings have been surjected from one reference to another. It is currently set to False by default. To enable it, add --normalize_reads=true
directly to the make_examples
binary. If you’re using the run_deepvariant
one-step approach, add --make_examples_extra_args="normalize_reads=true"
. Currently we don’t recommend turning this flag on for long reads due to potential runtime increase.--aux_fields_to_keep
flag to the make_examples
step, and set the default to only the auxiliary fields that DeepVariant currently uses. This reduces memory use for input BAM files that have large auxiliary fields that aren’t used in variant calling. Thanks to @williamrowell and @rhallPB for reporting this issue.make_examples
as well as call_variants
to address the issue reported in https://github.com/google/deepvariant/issues/491.The DeepVariant v1.2 release contains the following major improvements:
make_examples
better modularizes common components between DeepVariant, DeepTrio, and potential future applications. This enables DeepTrio to inherit improvements such as --add_hp_channel
(introduced to the DeepVariant PacBio model in v1.1; see blog), improving DeepTrio’s PacBio accuracy.Additional detail for improvements in DeepVariant v1.2:
Improvements for training:
Improvements for make_examples
:
For more details on flags, run /opt/deepvariant/bin/make_examples --help
for more details.
--split_skip_reads
flag: if True, make_examples will split reads with large SKIP cigar operations into individual reads. Resulting read parts that are less than 15 bp are filtered out.--emit_realigned_reads=true --realigner_diagnostics=/output/realigned_reads
for make_examples. You will still need to run samtools index
to get the index file, but no longer need to sort the BAM.Improvements for the one-step run_deepvariant
:
For more details on flags, run /opt/deepvariant/bin/run_deepvariant --help
for more details.
--runtime_report
which enables runtime report output to --logging_dir
. This makes it easier for users to get the runtime by region report for make_examples.--dry_run
flag is now added for printing out all commands to be executed, without running them. This is mentioned in the Quick Start section.The v1.1 release introduces DeepTrio, which uses a model specifically trained to call a mother-father-child trio or parent-child duo. DeepTrio has superior accuracy compared to DeepVariant. Pre-trained models are available for Illumina WGS, Illumina exome, and PacBio HiFi.
In addition, DeepVariant v1.1 contains the following improvements:
--add_hp_channel
which is enabled by default for PacBio.New optional flags to increase speed:
A team at Intel has adapted DeepVariant to use the OpenVINO toolkit, which further accelerates TensorFlow applications. This further speeds up the call_variants stage by ~25% for any model when run in CPU mode on an Intel machine. DeepVariant runs of OpenVINO have the same accuracy and are nearly identical to runs without. Runs with OpenVINO are fully reproducible on OpenVINO.
To use OpenVINO, add the following flag too the DeepVariant command:
--call_variants_extra_args "use_openvino=true"
We thank Intel for their contribution, and acknowledge the extensive work their team put in, captured in (https://github.com/google/deepvariant/pull/363)
DeepVariant v1.0 releases new features and accuracy improvements sufficiently substantial to indicate a major version of v1.0. Compared to DeepVariant v0.10, these changes reduce Illumina WGS errors by 24%, exome errors by 19%, and PacBio errors by 52%.
--alt_aligned_pileup
. --alt_aligned_pileup=diff_channels
is now default for DeepVariant PacBio model. This substantially improves INDEL accuracy for PacBio data.--sort_by_haplotypes
to optionally allow creating pileup images with reads sorted by haplotype. Haplotype sorting is based on the HP tag that must be present in input BAM, and --parse_sam_aux_fields
needs to be set as well. This substantially improves INDEL accuracy for PacBio data.--sort_by_haplotypes
by phasing variants and the input reads. Accuracy metrics for both single pass calling and two-pass calling are shown. Users may choose whether to run a second time for higher accuracy.--min_mapping_quality
in make_examples.py changed from 10 to 5. This improves accuracy of all models (WGS, WES, and PACBIO).--sequencing_type_image
and --custom_pileup_image
--only_keep_pass
flag to postprocess_variants.py to optionally only keep PASS calls in output VCF.binarize
function in modelling.py. (https://github.com/google/deepvariant/issues/286 fixed in https://github.com/google/deepvariant/commit/db87d77)--regions
when using run_deepvariant.py. (https://github.com/google/deepvariant/issues/305 fixed in https://github.com/google/deepvariant/commit/fbacd35)--version
to run_deepvariant.py. (https://github.com/google/deepvariant/issues/332 fixed in https://github.com/google/deepvariant/commit/f101492)--sample_name
flag to postprocess_variant.py and applied it in run_deepvariant.py as well. (https://github.com/google/deepvariant/issues/334 fixed in https://github.com/google/deepvariant/commit/a81d629)ws_use_window_selector_model
by default: This flag was turned on by default in v0.7.0. After the discussion in issue #272, we decided to turn this off to improve consistency and accuracy, at the trade-off of a 7% increase in runtime of the make_examples
step.
Users may add --make_examples_extra_args "ws_use_window_selector_model=true"
to save some runtime at the expense of accuracy.Full release notes:
New documentation:
Changes to Docker images, code, and models:
Changes to flags:
--sample_name
flag to run_deepvariant.py.vsc_min_fraction_indels
to 0.06 for Illumina data (WGS
and WES
mode) which increases sensitivity.--reads
to take multiple BAMs in a comma-separated list.--ref
for CRAM by default. (Set --use_ref_for_cram
to true by default)--realigner_diagnostics
and --emit_realigned_reads
flags in realigner.py.