Tools for working with SAM/BAM data
Maintenance release aligns with D compiler updates which had sambamba breaking in Debian. Notably lz4 got removed from the source tree and the meson build system is close to becoming the default. Again some free speed improvement thanks to the latest ldc2+LLVM toolchain. Amazing work by these groups!
Maintainance release and bug fixes: this is a special release where we removed all CRAM support. The added value of CRAM in sambamba was limited because it was using essentially the same htslib backend as samtools. Removing the htslib dependency removes one maintenance headache. See also https://github.com/biod/sambamba/issues/425.
BioD was also moved back into the main trunk. We separated it in the past, but as there is no development there we might as well have it in Sambamba (again).
penguin2:~$ /usr/bin/time --verbose ./sambamba-0.8.0 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.
20130415.bam test.bam
sambamba 0.8.0
by Artem Tarasov and Pjotr Prins (C) 2012-2020
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
finding positions of the duplicate reads in the file...
sorted 3969781 end pairs
and 73839 single ends (among them 22397 unmatched pairs)
collecting indices of duplicate reads... done in 616 ms
found 239673 duplicates
collected list of positions in 0 min 10 sec
marking duplicates...
collected list of positions in 0 min 17 sec
Command being timed: "./sambamba-0.8.0 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test.bam"
User time (seconds): 196.01
System time (seconds): 69.92
Percent of CPU this job got: 1392%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:19.09
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1732640
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 687925
Voluntary context switches: 4157903
Involuntary context switches: 6964
Swaps: 0
File system inputs: 1720752
File system outputs: 1967384
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Download and unzip sambamba-0.8.0.gz:
md5sum sambamba-0.8.0-linux-amd64-static.gz
7895d6d73f9d931525aa4fd709450803 sambamba-0.8.0-linux-amd64-static.gz
chmod u+x sambamba-0.8.0
./sambamba-0.8.0
penguin2:~$ /usr/bin/time --verbose ./sambamba-0.7.1-linux-static markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test.bam
sambamba 0.7.1
by Artem Tarasov and Pjotr Prins (C) 2012-2019
LDC 1.17.0 / DMD v2.087.1 / LLVM8.0.1 / bootstrap LDC - the LLVM D compiler (1.17.0)
finding positions of the duplicate reads in the file...
sorted 3969781 end pairs
and 73839 single ends (among them 22397 unmatched pairs)
collecting indices of duplicate reads... done in 642 ms
found 239673 duplicates
collected list of positions in 0 min 8 sec
marking duplicates...
collected list of positions in 0 min 18 sec
Command being timed: "./sambamba-0.7.1-linux-static markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test.bam"
User time (seconds): 177.73
System time (seconds): 45.90
Percent of CPU this job got: 1097%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.38
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1343524
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 436841
Voluntary context switches: 4610690
Involuntary context switches: 11696
Swaps: 0
File system inputs: 48
File system outputs: 1967368
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Static file was built on Debian testing. Md5sum:
a47932d27f92a2639d4b228eb7847e04 /home/wrk/sambamba-0.7.1-linux-static.gz
Added support for Picard style sorting, see https://github.com/biod/sambamba/issues/369 - thanks https://github.com/TimurIs
Add a new sorting option to pull together mates when sorting by read name, see https://github.com/biod/sambamba/pull/380 - thanks https://github.com/emi80
Many module renames to build a BioD for the future - thanks https://github.com/george-githinji
sambamba 0.6.9 by Artem Tarasov and Pjotr Prins (C) 2012-2019
LDC 1.14.0 / DMD v2.084.1 / LLVM7.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
2e1c46f4627a00f85a248b0941cbd37f bin/sambamba-0.6.9-linux-static.gz
Pre-release with a much faster statically compiled binary. 10-20% faster than v0.6.6, due to ldc and LLVM improvements. Fixes speed regression of v0.6.7 for large files due to singleobj compilation. See also #345 and performance
64-bit compilation should be fine on ldc 1.10+. i386 target is still a problem.
To install the image, download and
md5sum sambamba-0.6.8.gz
ee61000bcb33a82013c284bac8feb91f sambamba-0.6.8.gz
gzip -d sambamba-0.6.8.gz
chmod a+x sambamba-0.6.8
./sambamba-0.8.6
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
The binary images were built on x86_64 with
~/.config/guix/current/bin/guix pull -l
Generation 3 Sep 25 2018 09:39:08
guix 932839f
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: origin/master
commit: 932839ff124ff3b0dd3070914fb1c5beec69bf32
guix environment -C guix --ad-hoc gcc gdb bash ld-wrapper ldc which python git
make clean && make -j 16 && make check
for x in `ldd bin/sambamba|cut -d ' ' -f 3` ; do realpath $x ; done
/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr3-glibc-2.27/lib/libpthread-2.27.so
/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr3-glibc-2.27/lib/libm-2.27.so
/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr3-glibc-2.27/lib/librt-2.27.so
/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr3-glibc-2.27/lib/libdl-2.27.so
/gnu/store/bmaxmigwnlbdpls20px2ipq1fll36ncd-gcc-8.2.0-lib/lib/libgcc_s.so.1
/gnu/store/l4lr0f5cjd0nbsaaf8b5dmcw1a1yypr3-glibc-2.27/lib/libc-2.27.so
# build static image
make clean && make static -j 16 && make check
Git submodule versions were
git submodule status
2f0634b187e0f454809432093238cf31e9fbfee6 BioD (v0.2.0-5-g2f0634b)
2f3c3ea7b301f9b45737a793c0b2dcf0240e5ee5 htslib (0.2.0-rc10-271-g2f3c3ea)
b3692db46d2b23a7c0af2d5e69988c94f126e10a lz4 (v1.8.2)
9be93876982b5f14fcca60832563b3cd767dd84d undeaD (v1.0.1-49-g9be9387)
This is a pre-release of sambamba, please test.
Pre-release with a much faster statically compiled binary. 10-20% faster than v0.6.6, due to ldc and LLVM improvements. Fixes speed regression of v0.6.7 for large files due to singleobj compilation. See also #345 and performance
64-bit compilation should be fine on ldc 1.10. i386 target is still a problem.