Integration tests and performance benchmarking tools for implementations of S3 Active Storage servers.
There are currently two S3 active storage server implementations:
- Reductionist - a performant server implementation
- S3 Active Storage Prototype - a functional prototype
To set up the testing environment run:
python -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
To run the compliance test suite on your own implementation of an S3 active storage server, edit the following variables in compliance/config.py:
-
S3_SOURCE- The address of your S3 store (e.g.https://s3-proxy.com/). If you don't have an existing S3 store you can set up a temporary minio docker container by runningscripts/run-minio.shin a separate terminal, in which case you should leave the S3 source as localhost. -
AWS_ID- The AWS access key ID used to authenticate with the S3 source (defaults to 'minioadmin' for the minio docker container). -
AWS_PASSWORD- The secret access key used to authenticate with the S3 source (defaults to 'minioadmin' for the minio docker container) -
PROXY_URL- The address for your active storage proxy implementation (e.g.https://s3-proxy.example.com/) -
PROXY_CA_CERT- Optional path to a CA certificate for your active storage proxy.
The compliance test suite can then be run by calling
pytest
from within the project directory.
We aim to add tests for features as they are added to the S3 active storage server.
This does lead to problems when testing older versions of the server that may lack support for those features.
This is addressed through configuration variables in compliance/config.py.
TEST_X_ACTIVESTORAGE_COUNT_HEADER- Whether to test for the presence of thex-activestorage-countheader in responses.COMPRESSION_ALGS- List of names of compression algorithms to test. May be set to an empty list.FILTER_ALGS- List of names of filter algorithms to test. May be set to an empty list.MISSING_DATA- List of missing data description classes incompliance/missing.py. May be set to an empty list.TEST_BYTE_ORDER- Whether to test data with different byte orders (endianness).TEST_PUBLIC_BUCKET- Whether to test unauthenticated access to data stored in a public bucket.
As work on the Reductionist continues, and the compliance suite evolves in parallel, we continue to allow progressive testing of new features with the following configuration variables in compliance/config.py.
TEST_API_V2- Whether to switch tov2of theReductionistAPI where different object store backends are availableTEST_HTTP_OBJECT_STORE- Whether to use the newerReductionistAPI to test a HTTP object store backendHTTP_SOURCE- The URL of a HTTP object storeHTTP_USERNAME- Username for HTTP object store protected with HTTP basic access authenticationHTTP_PASSWORD- Password for authenticated HTTP object store
Testing of the HTTP object storage backend is an optional addition to the compliance suite. S3 object storage is still primary and the accompanying scripts in the suite can start both a local MiniO S3 object store and nginx HTTP object store, populating both with test data before sending separate API requests to Reductionist to ensuire its reductions using both object stores yield the same result.
Test data is currently generated as numpy arrays and then uploaded to the configured S3 source in binary format. Following this upload, requests are made to the active storage proxy and the proxy response is compared to the expected result based on the agreed API specification and the generated test arrays.
There are procedurally generated test cases to cover various combinations of reduction operation, data type, data shape and data slice parameters as well as testing of response codes and error messages informative error messages.
The scripts/run-benchmarks.py file can be used to benchmark running instances of the active storage proxy. To do so, edit the required config parameters at the top of the script (PROXY_URLS, S3_SOURCE & AUTH) and then run the script.
The benchmarking process will generate a few numpy arrays then upload them to a test bucket within the configured S3 source. The sum operation is then called repeatedly on each of these arrays to collect some timing statistics. For convenience, the script will also collect timing stats while bypassing the active storage proxy (i.e. fetching the full array from S3 and then performing the sum locally) to indicate how much of an improvement the active storage proxy is providing. Once all benchmark runs are finished, a boxplot figure will be generated (at the path ./benchmark--{timestamp}.png) to summarize the benchmark results. Finally, all generated test data is removed from the S3 source.
The following checks are run in CI (from the repository root):
black .
mypy compliance
flake8