Tests compare calling the reference implementation in C against equivalent functions in this crate. No link-time optimization (LTO) is used, so the C performance numbers have additional overhead for each function call.
Click any graph to see it full-size.
Compares the speed of hashing an entire buffer of data in one function call. Data sizes from 256 KiB to 4 MiB are tested. These graphs are boring flat lines, so a table is used instead.
| Implementation | Throughput (GiB/s) |
|---|---|
| Rust | 13.4 |
| C | 13.4 |
| Implementation | Throughput (GiB/s) |
|---|---|
| Rust | 16.7 |
| C | 16.6 |
Compares the speed of hashing a 1 MiB buffer of data split into various chunk sizes.
Compares the time taken to hash 0 to 32 bytes of data.
Compares the speed of hashing an entire buffer of data in one function call. Data sizes from 256 KiB to 4 MiB are tested. These graphs are boring flat lines, so a table is used instead.
| Implementation | Throughput (GiB/s) |
|---|---|
| Rust | 35.0 |
| C | 35.0 |
| C (scalar) | 21.2 |
| C (NEON) | 35.0 |
| Implementation | Throughput (GiB/s) |
|---|---|
| Rust | 58.9 |
| C | 25.1 |
| C (scalar) | 7.6 |
| C (SSE2) | 25.1 |
| C (AVX2) | 58.4 |
Compares the speed of hashing a 1 MiB buffer of data split into various chunk sizes.
Compares the time taken to hash 0 to 230 bytes of data. Representative samples are taken from similar times to avoid cluttering the graph and wasting benchmarking time.
Compares the speed of hashing an entire buffer of data in one function call. Data sizes from 256 KiB to 4 MiB are tested. These graphs are boring flat lines, so a table is used instead.
| Implementation | Throughput (GiB/s) |
|---|---|
| Rust | 34.4 |
| C | 34.8 |
| C (scalar) | 21.3 |
| C (NEON) | 34.6 |
| Implementation | Throughput (GiB/s) |
|---|---|
| Rust | 58.3 |
| C | 25.6 |
| C (scalar) | 7.6 |
| C (SSE2) | 25.5 |
| C (AVX2) | 57.4 |
Compares the speed of hashing a 1 MiB buffer of data split into various chunk sizes.
Compares the time taken to hash 0 to 230 bytes of data. Representative samples are taken from similar times to avoid cluttering the graph and wasting benchmarking time.
| CPU | Memory | C compiler |
|---|---|---|
| Apple M1 Max | 64 GiB | clang 16.0.0 |
| AMD Ryzen 9 3950X | 32 GiB | cl.exe 19.41.34120 |
Tests were run with rustc 1.82.0 (f6e511eec 2024-10-15).
| CPU | Apple M1 Max |
|---|---|
| Memory | 64 GiB |
| C compiler | Apple clang version 16.0.0 (clang-1600.0.26.3) |
| CPU | AMD Ryzen 9 3950X 16-Core Processor, 3501 Mhz, 16 Core(s), 32 Logical Processor(s) |
|---|---|
| Memory | 32 GiB (3600 MT/s) |
| C compiler | Microsoft (R) C/C++ Optimizing Compiler Version 19.41.34120 for x86 |