Download the test BAM file from https://www.internationalgenome.org/
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
Processing an 841Mb BAM file.
Check output:
./bin/sambamba-1.0.0 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam |md5sum sambamba 1.0.0 3cc1fcc85b0e5ab516784ffbbc9c347c
Sorted
02b42b6a7e5f8654a1bbc3ee1cf15a8d HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam ec5b9ba95b53f0d10c7949a933e65ac7 dedup.bam
time ./bin/sambamba-1.0.0 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null
sambamba 1.0.0
by Artem Tarasov and Pjotr Prins (C) 2012-2022
LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0)
real 0m1.435s
user 0m14.648s
sys 0m0.336s
time ./bin/sambamba-0.8.2-linux-amd64-static view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null
sambamba 0.8.2
by Artem Tarasov and Pjotr Prins (C) 2012-2021
LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1)
real 0m1.393s
user 0m14.114s
sys 0m0.366s
time ./bin/sambamba-0.8.1-pre1 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null
sambamba 0.8.1-pre1
by Artem Tarasov and Pjotr Prins (C) 2012-2021
LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real 0m1.398s
user 0m16.589s
sys 0m0.240s
--- sort
time ./bin/sambamba-1.0.0 sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
sambamba 1.0.1
by Artem Tarasov and Pjotr Prins (C) 2012-2023
LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0)
real 0m9.599s
user 2m1.039s
sys 0m3.834s
sambamba 1.0.0
by Artem Tarasov and Pjotr Prins (C) 2012-2022
LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0)
real 0m9.769s
user 2m1.187s
sys 0m3.830s
sambamba 0.8.2
by Artem Tarasov and Pjotr Prins (C) 2012-2021
LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1)
real 0m9.472s
user 1m58.529s
sys 0m3.850s
time ./bin/sambamba-0.8.1-pre1 sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
sambamba 0.8.1-pre1
by Artem Tarasov and Pjotr Prins (C) 2012-2021
LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real 0m9.151s
user 2m5.779s
sys 0m3.101s
time ./bin/sambamba-0.6.8-linux-static sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
real 0m10.213s
user 2m6.739s
sys 0m3.425s
--- markdup
time ./bin/sambamba-1.0.0 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam
sambamba 1.0.0
by Artem Tarasov and Pjotr Prins (C) 2012-2022
LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0)
real 0m10.831s
user 1m41.497s
sys 0m3.413s
time ./bin/sambamba-0.8.2-linux-amd64-static markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam
sambamba 0.8.2
by Artem Tarasov and Pjotr Prins (C) 2012-2021
LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1)
real 0m10.315s
user 1m38.043s
sys 0m3.292s
time ./bin/sambamba-0.8.1-pre1 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam
by Artem Tarasov and Pjotr Prins (C) 2012-2021
LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real 0m11.319s
user 1m47.719s
sys 0m4.070s
monza:~/tmp$ time ./sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null
sambamba 0.6.8-pre3 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.11.0 / DMD v2.081.2 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real 0m6.930s
user 0m26.940s
sys 0m0.516s
sambamba 0.6.8-pre2 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
real 0m6.854s
user 0m26.456s
sys 0m0.584s
linux-vdso.so.1 (0x00007ffd227fc000)
librt.so.1 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/librt.so.1 (0x00007f5d31082000)
libpthread.so.0 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libpthread.so.0 (0x00007f5d30e64000)
libm.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libm.so.6 (0x00007f5d30b52000)
libdl.so.2 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libdl.so.2 (0x00007f5d3094e000)
libgcc_s.so.1 => /gnu/store/h3z6nshhdlc8zgh4mi13x1br03xipi9r-gcc-7.2.0-lib/lib/libgcc_s.so.1 (0x00007f5d30737000)
libc.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libc.so.6 (0x00007f5d30398000)
/gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/ld-linux-x86-64.so.2 (0x00007f5d3128a000)
time ./build/sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/nullsambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
real 0m2.869s
user 0m21.972s
sys 0m0.356s
This version was built with:
LDC 1.1.1
using DMD v2.071.2
using LLVM 3.9.1
bootstrapped with LDC - the LLVM D compiler (1.1.1)
real 0m3.150s
user 0m24.668s
sys 0m0.320s
This version was built with:
LDC 1.7.0
using DMD v2.077.1
using LLVM 5.0.1
bootstrapped with LDC - the LLVM D compiler (1.7.0)
real 0m2.869s
user 0m22.344s
sys 0m0.344s
time ./sambamba_v0.6.6 sort -m 20615843020 -N -o /dev/null ENCFF696RLQ.bam -psambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
real 7m50.558s
user 89m10.808s
sys 2m57.188s
and with 120GB RAM
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
real 3m49.953s
user 81m16.956s
sys 1m58.332s
Wed Feb 7 03:43:14 CST 2018
sambamba 0.6.8-pre1
This version was built with:
LDC 1.7.0
using DMD v2.077.1
using LLVM 5.0.1
bootstrapped with LDC - the LLVM D compiler (1.7.0)
real 8m0.528s
user 88m44.084s
sys 2m45.888s
When sambamba is given enough RAM to hold everything in memory sambamba is twice as fast (apparently half the time goes to intermediate IO)
time ./sambamba sort -N -o /dev/null ENCFF696RLQ.bam -p -m 120Greal 3m46.856s user 81m44.524s sys 1m56.388s
with 64GB it is
real 5m36.062s user 88m43.176s sys 3m0.536s
and with 32GB it is
real 7m22.125s user 89m6.188s sys 2m51.228s
This version was built with:
LDC 1.7.0
using DMD v2.077.1
using LLVM 5.0.1
bootstrapped with LDC - the LLVM D compiler (1.7.0)
real 18m15.809s
user 158m30.148s
sys 3m15.932s
Ouch! A regression in the shipped release 0.6.7.
This version was built with:
LDC 1.1.1
using DMD v2.071.2
using LLVM 3.9.1
bootstrapped with LDC - the LLVM D compiler (1.1.1)
ldc2 -wi -I. -IBioD -IundeaD/src -g -O3 -release -enable-inlining -boundscheck=off
real 18m40.223s
user 159m34.292s
sys 3m19.300s
So, the same build is 2x slower than the previous version.
This version was built with:
LDC 1.1.1
using DMD v2.071.2
using LLVM 3.9.1
bootstrapped with LDC - the LLVM D compiler (1.1.1)
Using ldmd2 @sambamba-ldmd-release.rsp
"-g" "-O2" "-c" "-m64" "-release" "-IBioD/" "-IundeaD/src/" "-ofbuild/sambamba.o" "-odbuild" "-I."
gcc -Wl,--gc-sections -o build/sambamba build/sambamba.o -Lhtslib -Llz4/lib -Wl,-Bstatic -lhts -llz4 -Wl,-Bdynamic /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libphobos2-ldc.a /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libdruntime-ldc.a -lrt -lpthread -lm
real 9m9.465s
user 97m56.204s
sys 2m50.512s
Updated the makefile to build with -singleobj. Now LLVM kicks in!
This version was built with:
LDC 1.7.0
using DMD v2.077.1
using LLVM 5.0.1
bootstrapped with LDC - the LLVM D compiler (1.7.0)
real 8m1.978s
user 89m13.936s
sys 2m47.392s
Next I tried adding profile guided optimization. But that turned out to be slower
This version was built with:
LDC 1.7.0
using DMD v2.077.1
using LLVM 5.0.1
bootstrapped with LDC - the LLVM D compiler (1.7.0)
real 11m16.267s
user 116m15.556s
sys 2m56.244s
So, the release is reverted an after a version bump:
This version was built with:
LDC 0.17.1
using DMD v2.068.2
using LLVM 3.8.0
bootstrapped with version not available
real 10m0.932s
user 151m39.172s
sys 3m7.596s
This version was built with:
LDC 1.1.1
using DMD v2.071.2
using LLVM 3.9.1
bootstrapped with LDC - the LLVM D compiler (1.1.1)
real 9m22.501s
user 98m24.748s
sys 2m51.996s
Note, updating compiler shows a speed gain for 0.6.6.
sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)
finding positions of the duplicate reads in the file...
sorted 11286293 end pairs
and 156042 single ends (among them 0 unmatched pairs)
collecting indices of duplicate reads... done in 1325 ms
found 6603388 duplicates
collected list of positions in 0 min 16 sec
marking duplicates...
collected list of positions in 1 min 2 sec
Command being timed: "./bin/sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam"
User time (seconds): 406.49
System time (seconds): 3.86
Percent of CPU this job got: 649%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.13
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1709720
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1140382
Voluntary context switches: 393213
Involuntary context switches: 8993
Swaps: 0
File system inputs: 0
File system outputs: 2663824
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Uses slightly more memory but is faster than
/usr/bin/time --verbose sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam
finding positions of the duplicate reads in the file...
sorted 11286293 end pairs
and 156042 single ends (among them 0 unmatched pairs)
collecting indices of duplicate reads... done in 1521 ms
found 6603388 duplicates
collected list of positions in 0 min 16 sec
marking duplicates...
total time elapsed: 1 min 4 sec
Command being timed: "sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam"
User time (seconds): 423.78
System time (seconds): 4.47
Percent of CPU this job got: 666%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04.24
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1542764
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1839470
Voluntary context switches: 368082
Involuntary context switches: 8537
Swaps: 0
File system inputs: 0
File system outputs: 2643840
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
/usr/bin/time --verbose ./bin/sambamba-0.7.1 "--DRT-gcopt=profile:1" markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test.bam
Commit 5f52f04aae3de1dce2d13b9e748002b4e513ded0
by Artem Tarasov and Pjotr Prins (C) 2012-2019
LDC 1.17.0 / DMD v2.087.1 / LLVM8.0.1 / bootstrap LDC - the LLVM D compiler (1.17.0)
finding positions of the duplicate reads in the file...
sorted 3969781 end pairs
and 73839 single ends (among them 22397 unmatched pairs)
collecting indices of duplicate reads... done in 372 ms
found 239673 duplicates
collected list of positions in 0 min 6 sec
marking duplicates...
collected list of positions in 0 min 22 sec
Number of collections: 107
Total GC prep time: 10 milliseconds
Total mark time: 548 milliseconds
Total sweep time: 26 milliseconds
Max Pause Time: 10 milliseconds
Grand total GC time: 585 milliseconds
GC summary: 1158 MB, 107 GC 585 ms, Pauses 558 ms < 10 ms
Command being timed: "./bin/sambamba-0.7.1 --DRT-gcopt=profile:1 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test2.bam"
User time (seconds): 136.00
System time (seconds): 2.39
Percent of CPU this job got: 583%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.70
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1282940
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 396600
Voluntary context switches: 199806
Involuntary context switches: 5017
Swaps: 0
File system inputs: 16
File system outputs: 1967376
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0