fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing
thanks @ohickl for reporting and providing a test-case.
The new optional arguments:
-l --min-frag-len <min-frag-len> minimum insert size. reads with a smaller insert size than this are ignored [default: -1]
-u --max-frag-len <max-frag-len> maximum insert size. reads with a larger insert size than this are ignored. [default: -1]
when using read-groups, there was an intermittent error that would sometimes skip reads. thanks @chrisamiller for reporting and providing a test-case.
This release adds support for writing d4 files. See Aaron's poster here
d4
is a toolset and format written by Hao Hou from the Quinlan Lab.
mosdepth
provides many options while calculating depth because it is slow to re-parse the per-base.bed.gz files. In
many cases, it's faster to re-parse a cram file than to scan large regions from the per-base bed files. In addition, writing per-base.bed.gz has always been a bottleneck in mosdepth even after it was optimized some in last release.
This release has a static d4utils binary for linux below that will allow users to manipulate d4 files.
Here are mosdepth run times on a smallish cram test-case:
Note that using d4
output greatly mitigates the cost of writing the per-base output.
With d4 mosdepth can write per-base output for a 23X CRAM in 2m15s
Once the d4 file is created, it is much faster to access. d4 includes command line utilities to view, get stats, and manipulate d4 files. These eventually will replace much of the functionality in mosdepth like quantize
, histogram (dist.txt)
, regions.bed.gz
etc since the operations are so fast.
I made several pull requests to Devon Ryan's excellent BigWig library to improve speed and attempt to reduce memory usage: #41, #42, #43.
I also wrote a bigwig library for nim that uses libBigWig and used that to prototype bigwig output for mosdepth
. However, bigwig output dramatically increased the memory usage in mosdepth
such that it was not viable.
We will show in the coming manuscript (and see the poster) that d4
is much faster to create and use than bigwig
and results in smaller file sizes.
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
mosdepth_v028 -x $exome |
231.300 ± 8.175 | 222.166 | 242.883 | 1.73 ± 0.07 |
mosdepth_v029 -x $exome |
184.653 ± 7.520 | 176.238 | 192.636 | 1.38 ± 0.07 |
mosdepth_v028 -x -t 4 $exome |
170.924 ± 3.811 | 166.359 | 175.284 | 1.28 ± 0.04 |
mosdepth_v029 -x -t 4 $exome |
133.504 ± 3.151 | 129.220 | 138.062 | 1.00 |