This repository accompanies the paper Neon NTT: Faster Dilithium, Kyber, and Saber on Cortex-A72 and Apple M1 published at IACR Transactions on Cryptographic Hardware and Embedded Systems (TCHES), Issue 1, Volume 2022. The paper is also available at ePrint 2021/986.
There are several updates after the publication. To reproduce the results in the paper, please refer to the commit
a96c17dbe74ac7675c785a728396e216555c432b.
Authors:
- Hanno Becker
<[email protected]> - Vincent Hwang
<[email protected]> - Matthias J. Kannwischer
<[email protected]> - Bo-Yin Yang
<[email protected]> - Shang-Yi Yang
<[email protected]>
It contains our source code for Dilithium, Kyber, and Saber optimized for Cortex-A72. The code is also executable on other Armv8 cores and the Apple M1. However, the benchmarking code in this repository has only been tested with the Cortex-A72.
Clone the code together with all submodules
git clone --recurse-submodules https://github.com/neon-ntt/neon-ntt
We have tested this code with a Raspberry Pi4 running Ubuntu 21.04. It is essential that you are running a 64-bit OS to be able to execute aarch64 code. The 64-bit Raspbian OS can be used, but we have not tested all code with it.
The Makefiles included in this repo assume that you are natively compiling your code using gcc. We have tested it with gcc 10.3.0.
For accurate benchmarking on Cortex-A72, we make use of the performance counters. By default the access from user mode is disabled. You will need to enable it using a kernel module.
We have included on here that you can install
cd enable_ccr
make install
Alternatively, you can get one from: https://github.com/rdolbeau/enable_arm_pmu.
You may have to install the kernel headers manually: https://www.raspberrypi.com/documentation/computers/linux_kernel.html
sudo apt install raspberrypi-kernel-headers
In case the kernel headers can still not be found, you may have to change the
uname -r to uname -m in the Makefile.
We have tested this code on a Apple M1 Mac mini running macOS Big Sur 11.4. We used Apple clang 12.0.5.
For cycle counting we make use of https://github.com/cothan/kyber/blob/master/neon/m1cycles.c.
For each of the schemes, we provide three folders:
ntt: Code for the core polynomial arithmetic.microbenchmarks: Standalone code for benchmarking individual functions;makewill produce a benchmarking binary (when executed on an A72). For running benchmarks, user space access to the PMU cycle count register needs to be enabled. A kernel module that enables it, can, for example, be found in pqax.m1_benchmarks: Standalone code for benchmarking on Apple M1.scheme: Contains the entire code for the scheme; ready to be placed in supercop (see below)
The following instructions allow you to reproduce the microbenchmarks in Table 4 and Table 5 of the paper.
- Copy
microbenchmarksfolder to the Cortex-A72 - Type
makein there - You should see a
speedexecutable. Run it with./speed - If you get
Illegal Instruction, you need to enable access to the performance counters (see Installation)
The following instructions should allow to benchmark the full schemes (Table 6) using SUPERCOP on the Raspberry Pi4 running Ubuntu 21.04:
- Download https://bench.cr.yp.to/supercop/supercop-20210604.tar.xz
- Remove every line except the first from okcompilers/c and okcompilers/cpp to speed up benchmarking.
- Remove
#include <sys/sysctl.h>fromcpucycles/armv8.c - Make sure that the access to the cycle counters from user mdoe is enabled before proceeding.
- Run
./do-part used(this will take a couple of hours) or alternatively, buildrandombytesas follows (this is the only dependency to SUPERCOP):./do-part init./do-part crypto_stream chacha20./do-part crypto_rng chacha20
- Copy over the scheme you want
- e.g.,
cp -rL kyber768/scheme <SUPERCOP_PATH>/crypto_kem/kyber768/ - or, run
sh cp2supercop.sh - Note that for Dilithium the testvectors have changed: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/BjfjRMIdnhM/m/W7kkVOFDBAAJ
- You will have to copy over the updated testvectors
- e.g.,
cp dilithium2/checksum* <SUPERCOP_PATH>/crypto_sign/dilithium2
- e.g.,
- Benchmark using, e.g.,
./do-part crypto_kem kyber768 - Results will be in
bench/<hostname>/data - If you want to run more iterations, change
TIMINGSincrypto_{kem/sign}/measure.cbefore running./do-part crypto_{kem/sign} ...
- Copy
m1_benchmarksto the Apple M1 - Type
makein there - You should see a
<scheme>_testand<scheme>_speedexecutable. - Running the test, e.g.,
./kyber_testshould give youTest successful - You can run the benchmarks using, e.g.,
sudo ./kyber_speedwhich should reproduce the results in Table 4, Table 5, and Table 6. - Accessing the cycle counts requires to run the executable as root!
Probably access to the PMU cycle counters from user space is not enabled. For enabling it, see https://github.com/mupq/pqax#enable-access-to-performance-counters.
Using m1cycles.c requires root. Re-runt he executable with sudo.
This repository includes code from other sources that has the following license/license waivers
feat.Smodified from https://github.com/bwesterb/armed-keccak: MIT- Kyber reference code https://github.com/pq-crystals/kyber/blob/master/LICENSE: CC0 or Apache 2.0
- Saber reference code https://github.com/KULeuven-COSIC/SABER/blob/master/LICENSE: public domain
- Dilithium reference code https://github.com/pq-crystals/dilithium/blob/master/LICENSE: CC0 or Apache 2.0
- Neon-optimized Kyber: Apache 2.0 at https://github.com/GMUCERG/PQC_NEON/blob/main/neon/kyber or public domain at https://github.com/cothan/kyber/blob/master/neon
fips202.{c,h}http://bench.cr.yp.to/supercop.html: public domainfips202x2.{c,h}https://github.com/cothan/kyber/blob/master/neon/fips202x2.c: CC0 or Apache 2.0m1cycles.{c, h}: https://github.com/cothan/SABER/blob/master/Cortex-A_Implementation_KEM/m1cycles.c: public domain or Apache 2.0gen_table/commonfrom https://github.com/multi-moduli-ntt-saber/multi-moduli-ntt-saber: CC0
All the files in this repository are covered by CC0 by default unless stated otherwise in the beginning of the files.