Hyper-optimized CMA-ES in Rust with a first-class Python experience
SIMD β’ rayon β’ deterministic seeds β’ vectorized objectives β’ restarts β’ constraints β’ Rich TUI
Published to PyPI as fast-cmaes (module name: fastcma). Latest release: v0.1.4
- Why CMA-ES
- Architecture
- Features
- Installation
- Quickstart
- Vectorized & Constraints
- Rust Usage
- C/C++ FFI
- REST API
- Demos & TUI
- Baselines & Benchmarks
- Performance
- Feature Flags
- Testing
- License
Derivative-free β’ Adaptive β’ Parallel-friendly
CMA-ES (Covariance Matrix Adaptation Evolution Strategy) represents a paradigm shift in optimization. Invented by Nikolaus Hansen in 1996, CMA-ES emerged from decades of research into evolution strategies and has since become one of the most successful derivative-free optimization algorithms. Hansen, who continues to research and refine CMA-ES along with other evolutionary algorithms, developed the method to address fundamental limitations in existing optimization techniques. Unlike gradient-based methods that require smooth, differentiable objectives, CMA-ES thrives on black-box problems where gradients are unavailable, noisy, or meaningless. The algorithm automatically adapts its search distribution by learning the problem's structure through successful search directions, making it particularly effective for non-convex landscapes with curved valleys and deceptive local minima.
The core strength lies in its adaptive mechanisms. The step size
Key Advantages:
| Feature | Benefit |
|---|---|
| No gradients required | Works with black-box, noisy, or discontinuous objectives |
| Adaptive step size | Automatically balances exploration vs exploitation |
| Covariance adaptation | Learns problem structure and follows favorable search directions |
| Population-based | Naturally parallelizable, robust to local minima |
flowchart LR
subgraph PythonAPI["Python API"]
A[fmin/fmin_vec/CMAES]
B[Constraints/Restarts]
C[Baseline]
end
subgraph RustCore["Rust Core"]
D[Ask-Tell]
E[Covariance]
F[Sigma]
G[SIMD/Rayon]
end
subgraph TestsDemos["Tests"]
I[Benchmarks]
J[TUI]
K[Integration]
end
A --> D
B --> D
PythonAPI --> RustCore
TestsDemos --> RustCore
classDef pythonStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef rustStyle fill:#ffcccc,stroke:#ff9999,stroke-width:1px,color:#333
classDef testStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef subgraphStyle fill:#f5f5f5,stroke:#ccc,stroke-width:1px
class A,B,C pythonStyle
class D,E,F,G rustStyle
class I,J,K testStyle
class PythonAPI,RustCore,TestsDemos subgraphStyle
1. Ask-Tell Interface (CmaesState)
The optimizer follows the classic CMA-ES ask-tell pattern:
flowchart TD
Start([Start]) --> Init[Init xmean, Ο, C]
Init --> Ask[ask: Generate Ξ»]
Ask --> Eval[Evaluate fitness]
Eval --> Tell[tell: Update]
Tell --> UpdateMean[Update xmean]
UpdateMean --> UpdateCov[Update C]
UpdateCov --> UpdateSigma[Adapt Ο]
UpdateSigma --> CheckTerm{Done?}
CheckTerm -->|No| Ask
CheckTerm -->|Yes| End([Return])
classDef askStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef tellStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef decisionStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
classDef startEndStyle fill:#e1bee7,stroke:#ce93d8,stroke-width:1px,color:#333
class Ask,Eval askStyle
class Tell,UpdateMean,UpdateCov,UpdateSigma tellStyle
class CheckTerm decisionStyle
class Start,End startEndStyle
Mathematical Formulation:
-
ask(): Generates$$\lambda$$ candidate solutions by sampling:$$x_i \sim \mathcal{N}(\mathbf{x}_{\mathrm{mean}}, \sigma^2 \mathbf{C})$$ What this means: We create new candidate solutions by drawing from a multivariate normal distribution centered at the current best estimate (
$$\mathbf{x}_{\mathrm{mean}}$$ ), with spread controlled by the step size$$\sigma$$ and shape determined by the covariance matrix$$\mathbf{C}$$ . Think of this as generating "guesses" around our current best point, where the covariance matrix learns the shape of the problem landscape.Why this matters: Instead of randomly searching everywhere, CMA-ES learns the "direction" and "shape" of good solutions through the covariance matrix, making each generation of candidates smarter than the last.
-
tell(): Updates distribution parameters:-
Weighted mean:
$$\mathbf{x}_{\mathrm{mean}} \leftarrow \sum_{i=1}^{\mu} w_i \mathbf{x}_{i:\lambda}$$ What this means: We update our search center by taking a weighted average of the best
$$\mu$$ solutions from the current generation. Better solutions get higher weights ($$w_i$$ ), so the center moves toward promising regions.Why this matters: Instead of jumping to the single best solution, we take an "elite average" that reduces noise and provides a more stable search direction.
-
Evolution paths:
$$\mathbf{p}_c \leftarrow (1-c_c)\mathbf{p}_c + c_c h_{\sigma} \frac{\mathbf{x}_{\mathrm{mean}} - \mathbf{x}_{\mathrm{old}}}{\sigma}$$ $$\mathbf{p}_{\sigma} \leftarrow (1-c_{\sigma})\mathbf{p}_{\sigma} + c_{\sigma} \frac{\mathbf{C}^{-1/2}(\mathbf{x}_{\mathrm{mean}} - \mathbf{x}_{\mathrm{old}})}{\sigma}$$ What this means: Evolution paths track the direction of recent progress.
$$\mathbf{p}_c$$ remembers successful moves (for covariance adaptation), while$$\mathbf{p}_{\sigma}$$ monitors step size effectiveness. The$$h_{\sigma}$$ term prevents updates when steps are too small.Why this matters: These "memory" vectors help the algorithm distinguish between random fluctuations and genuine progress, enabling smarter adaptation of both search direction and step size.
-
Covariance update:
$$\mathbf{C} \leftarrow (1-c_1-c_{\mu})\mathbf{C} + c_1 \mathbf{p}_c \mathbf{p}_c^T + c_{\mu} \sum_{i=1}^{\mu} w_i \mathbf{y}_{i:\lambda} \mathbf{y}_{i:\lambda}^T$$ What this means: The covariance matrix evolves through three parts: (1) gradual forgetting of old information, (2) learning from the evolution path direction, and (3) incorporating successful steps from the current population.
$$\mathbf{y}_{i:\lambda}$$ are the steps taken relative to the mean.Why this matters: This is how CMA-ES "learns" the shape of the problem. If good solutions consistently lie along certain directions, the covariance matrix stretches the search distribution in those directions.
-
Step size adaptation:
$$\sigma \leftarrow \sigma \exp\left(\frac{c_{\sigma}}{d_{\sigma}}\left(\frac{|\mathbf{p}_{\sigma}|}{\mathbb{E}[|\mathcal{N}(\mathbf{0},\mathbf{I})|]} - 1\right)\right)$$ What this means: The step size increases when the evolution path is longer than expected (indicating good progress) and decreases when it's shorter (indicating stagnation). The expected length comes from a standard normal distribution.
Why this matters: This automatic step size control prevents the algorithm from taking steps that are too large (missing optima) or too small (wasting evaluations), maintaining an optimal balance between exploration and exploitation.
-
2. Covariance Representation (CovarianceMode)
Two modes optimize for different problem characteristics:
flowchart TD
Start([Mode Selection]) --> CheckDim{n < 50?}
CheckDim -->|Yes| Full["Full: nΓn, O(n^{3})"]
CheckDim -->|No| Diag["Diagonal: n, O(n)"]
Full --> FullUse["Correlations, curved valleys"]
Diag --> DiagUse["Fast, separable"]
classDef fullStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef diagStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef decisionStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
class Full,FullUse fullStyle
class Diag,DiagUse diagStyle
class CheckDim decisionStyle
Mathematical Formulation:
The choice between full and diagonal covariance represents a fundamental trade-off between modeling accuracy and computational cost. Full covariance stores the complete
Diagonal covariance takes a pragmatic approach, storing only the
Covariance Mode Comparison:
| Aspect | Full Covariance | Diagonal Covariance |
|---|---|---|
| Storage | ||
| Sampling cost |
|
|
| Correlation modeling | Captures all pairwise correlations | Assumes independence |
| Best for | Low-medium dimensions (< 50), correlated variables | High dimensions (> 50), separable problems |
| Convergence quality | Higher (adapts to problem structure) | Lower (but often sufficient) |
| Speed | Slower (eigen decomposition bottleneck) | Faster (linear scaling) |
Covariance Update Process:
The covariance matrix evolves through a combination of rank-one and rank-ΞΌ updates:
The first term decays the current covariance, preventing it from growing unbounded. The second term incorporates the evolution path
3. Parameter Adaptation
- Cumulative step-size adaptation: Evolution path
pstracks the direction of recent steps, enabling faster convergence along favorable directions - Cumulative covariance adaptation: Evolution path
pcaccumulates successful search directions, shaping the covariance matrix - Rank-ΞΌ and rank-one updates: Combines information from the current population (rank-ΞΌ) and the evolution path (rank-one) for robust covariance adaptation
Optimization Strategy Overview:
flowchart LR
Start([Optimizations]) --> SIMD[SIMD 4x]
Start --> Lazy[Lazy Eigen 5-10x]
Start --> Rayon[Rayon Linear]
Start --> Buffers[Pre-alloc]
Start --> Symmetry[Symmetry]
classDef simdStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef lazyStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef rayonStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
classDef bufferStyle fill:#e1bee7,stroke:#ce93d8,stroke-width:1px,color:#333
classDef symStyle fill:#f8bbd0,stroke:#f48fb1,stroke-width:1px,color:#333
class SIMD simdStyle
class Lazy lazyStyle
class Rayon rayonStyle
class Buffers bufferStyle
class Symmetry symStyle
1. SIMD-Accelerated Dot Products
// Uses portable_simd with 4 f64 lanes (256-bit AVX on x86_64)
fn dot_simd(a: &[f64], b: &[f64]) -> f64- Processes 4 elements per iteration using SIMD instructions
- Falls back to scalar for remainder elements
- Used in Mahalanobis norm computation, square sums, and constraint clamping
- Impact: ~3-4x speedup on dot products compared to naive scalar loops
2. Lazy Eigensystem Updates
The most expensive operation (
What this means: This formula calculates how long we can wait before recomputing the expensive eigen decomposition. The gap grows with problem dimension (c_1, c_{\mu}), but shrinks relative to the cost of the decomposition (
Why this matters: Eigen decomposition can be 90% of runtime in high dimensions, but the eigenvectors change slowly. This optimization defers the expensive computation until it's actually needed, reducing it by 5-10x in typical runs.
flowchart TD
AskCall[ask called] --> CheckGap{gap exceeded?}
CheckGap -->|No| UseCache[Use cache]
CheckGap -->|Yes| EnforceSym[Enforce symmetry]
EnforceSym --> EigenDecomp["Eigen O(n^{3})"]
EigenDecomp --> UpdateCache[Update cache]
UpdateCache --> UseCache
UseCache --> Sample[Sample]
classDef decisionStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
classDef computeStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef cacheStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
class CheckGap decisionStyle
class EnforceSym,EigenDecomp,UpdateCache computeStyle
class UseCache,Sample cacheStyle
- Eigensystem only recomputed when
current_eval > updated_eval + lazy_gap_evals - Gap grows with dimension and learning rates, naturally reducing update frequency
- Covariance matrix updates continue (cheap rank-one/rank-ΞΌ), but sampling uses cached eigenbasis
- Impact: Reduces eigen decompositions by 5-10x in typical runs, critical for high-dimensional problems
3. Rayon Parallelization
- Fitness evaluation:
x_candidates.par_iter().map(|x| objective(x))parallelizes embarrassingly parallel objective calls - Covariance updates:
data.par_chunks_mut(n).for_each(...)parallelizes outer product accumulation - Impact: Near-linear speedup with CPU cores (e.g., 8 cores β ~7x faster on parallelizable objectives)
4. Pre-allocated Buffers
Reusable scratch buffers (z_buf, y_buf, tmp_buf) eliminate allocations in hot paths:
z_buf: Standard normal samples in eigenbasis spacey_buf: Transformed samples in original spacetmp_buf: Temporary storage for matrix-vector operations- Impact: Zero-allocation inner loops, critical for low-latency applications
5. Symmetry Enforcement Covariance matrices must be symmetric, but floating-point updates can introduce asymmetry:
fn enforce_symmetry(&mut self) // Averages A[i,j] and A[j,i]- Ensures numerical stability without expensive checks on every access
- Applied only before eigen decomposition (lazy evaluation)
Robustness in fastcma extends beyond handling difficult optimization problems to ensuring reliable behavior under diverse conditions. The implementation includes multiple layers of protection against common failure modes: numerical instability, noisy evaluations, invalid inputs, and infinite loops. These features work together to create a system that gracefully handles edge cases rather than crashing or producing incorrect results.
1. Deterministic Seeding
fn new_with_seed(..., seed: u64) -> Self {
state.rng = StdRng::seed_from_u64(seed);
}- Why: Enables reproducible benchmarks, regression testing, and debugging
- Implementation: Uses
rand::StdRngwithSeedableRngtrait - Impact: Same seed β identical optimization trajectory (critical for CI/CD and scientific reproducibility)
2. Noise Handling Detects when fitness evaluations are too similar (indicating noise or stagnation):
if spread < threshold_rel {
sigma *= sigma_expand; // Expand step size to escape
noise_cooldown = cooldown_iters; // Prevent oscillation
}- Why: Noisy objectives can trap CMA-ES with artificially small step sizes
- Detection: Compares fitness spread (
mid - best) relative to best value - Response: Temporarily expands
sigmawith cooldown to prevent thrashing - Impact: Robust performance on noisy objectives (simulations, Monte Carlo, etc.)
3. Eigenvalue Flooring Prevents numerical issues from negative or near-zero eigenvalues:
if min_ev <= 0.0 {
for ev in &mut eigs {
if *ev < eps { *ev = eps; } // Floor to 1e-20
}
}- Why: Negative eigenvalues break sampling; near-zero cause numerical instability
- Impact: Prevents panics and ensures valid covariance matrices even under numerical drift
4. Constraint Handling Pipeline
Multi-stage constraint satisfaction with fallback mechanisms:
flowchart TD
Sample[Sample] --> BoxProj[Box Projection]
BoxProj --> Mirror{mirror?}
Mirror -->|Yes| MirrorOp[Reflect]
Mirror -->|No| CheckReject{reject?}
MirrorOp --> CheckReject
CheckReject -->|Yes| RejectCheck[Check]
RejectCheck -->|Fail| Resample{retry?}
Resample -->|Yes| Sample
Resample -->|No| CheckRepair{repair?}
RejectCheck -->|Pass| CheckRepair
CheckReject -->|No| CheckRepair
CheckRepair -->|Yes| RepairOp[Repair]
CheckRepair -->|No| EvalObj[Eval]
RepairOp --> EvalObj
EvalObj --> CheckPenalty{penalty?}
CheckPenalty -->|Yes| AddPenalty[Penalty]
CheckPenalty -->|No| ReturnFit[Return]
AddPenalty --> ReturnFit
classDef sampleStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef constraintStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef decisionStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
classDef penaltyStyle fill:#f8bbd0,stroke:#f48fb1,stroke-width:1px,color:#333
class Sample,Resample sampleStyle
class BoxProj,MirrorOp,RejectCheck,RepairOp constraintStyle
class Mirror,CheckReject,Resample,CheckRepair,CheckPenalty decisionStyle
class AddPenalty penaltyStyle
class EvalObj,ReturnFit sampleStyle
Stages:
-
Box projection (SIMD-accelerated): Clamps to
$$[\mathbf{lb}, \mathbf{ub}]$$ bounds$$x_i \leftarrow \max(\min(x_i, ub_i), lb_i)$$ What this means: For each dimension, we ensure the candidate solution stays within the bounds by taking the minimum of the upper bound and maximum of the lower bound. It's like bouncing off invisible walls.
Why this matters: Simple and fast way to handle box constraints without rejecting potentially good solutions that are just slightly out of bounds.
-
Mirroring (optional): Reflects out-of-bounds points into feasible region
$$x_i \leftarrow \begin{cases} lb_i + (|x_i - lb_i| \bmod 2w_i) & \mathrm{if\ } (|x_i - lb_i| \bmod 2w_i) \leq w_i \ ub_i - ((|x_i - lb_i| \bmod 2w_i) - w_i) & \mathrm{otherwise} \end{cases}$$
where
$$w_i = ub_i - lb_i$$ What this means: Points that go outside bounds get "folded back" into the feasible region using modular arithmetic. It's like wrapping around the edges of the box, creating a toroidal search space.
Why this matters: Instead of clamping (which can create artificial boundaries), mirroring preserves the search direction and maintains exploration near boundaries.
-
Rejection/resampling: Up to
max_resamplesattempts ifreject()predicate fails -
Repair (optional): User-provided function to fix infeasible points
-
Penalty (optional): Adds penalty to fitness for remaining violations
$$f_{\mathrm{penalized}} = f(\mathbf{x}) + \mathrm{penalty}(\mathbf{x})$$ What this means: If a solution violates constraints, we add an extra penalty term to its fitness score, making it less attractive to the optimizer. The penalty function quantifies how "bad" the constraint violation is.
Why this matters: Allows CMA-ES to handle complex constraints (like nonlinear inequalities) by converting them into a modified objective function that guides search away from infeasible regions.
- Why: Flexible constraint handling accommodates diverse problem types
- Impact: Works with box constraints, nonlinear constraints (via penalty), and custom feasibility rules
5. NaN Handling Robust sorting handles NaN fitness values:
match (ai.partial_cmp(&aj), ai.is_nan(), aj.is_nan()) {
(_, true, false) => Ordering::Greater, // NaN > valid
(_, false, true) => Ordering::Less, // valid < NaN
...
}- Why: Invalid objective evaluations shouldn't crash the optimizer
- Impact: Graceful degradation when objectives return NaN/Inf
6. Termination Conditions Multiple termination criteria prevent infinite loops:
- Max evaluations: Hard budget limit
- Target fitness: Early exit on convergence
- Condition number: Stops if covariance becomes ill-conditioned (> 1e14)
- TolFun: Stops if fitness improvement stalls (< 1e-12)
- TolX: Stops if step size becomes negligible
- Impact: Reliable termination under diverse conditions
Every design decision in fastcma reflects careful consideration of trade-offs between performance, usability, and maintainability. The choices prioritize practical benefits for end users while maintaining code quality and extensibility. These decisions aren't arbitrary but represent informed choices based on profiling, benchmarking, and real-world usage patterns.
1. Rust + PyO3 vs Pure Python
The choice to implement the core in Rust with Python bindings represents a fundamental trade-off between development complexity and runtime performance. Rust provides zero-cost abstractions that enable high-level code without runtime overhead, memory safety that prevents entire classes of bugs, and direct access to SIMD instructions for vectorization. PyO3 bridges the gap between Rust's performance and Python's ecosystem, enabling seamless integration with minimal overhead.
The trade-off manifests in build complexity: users must compile Rust code rather than simply installing Python packages. However, the performance gains justify this complexity: core operations achieve 10-100x speedups compared to pure Python implementations. The inclusion of a pure Python baseline (fastcma_baseline) enables direct performance comparisons, demonstrating the value of the Rust implementation.
2. Full vs Diagonal Covariance
Rather than forcing a single covariance representation, fastcma provides user-selectable modes that optimize for different problem characteristics. Full covariance captures all pairwise variable correlations, enabling adaptation to arbitrary search distributions, but requires
The trade-off becomes clear in high dimensions: full covariance provides better convergence quality but becomes computationally prohibitive, while diagonal covariance offers acceptable quality with dramatically better performance. By making this configurable, users can choose based on their specific problem characteristics and computational constraints.
3. Lazy vs Eager Eigensystem Updates
The lazy eigensystem update strategy defers expensive
The trade-off involves slightly stale eigenbasis information between updates, but the performance gain is substantial: 5-10x fewer decompositions in typical runs. The adaptive gap calculation ensures updates occur frequently enough to maintain accuracy while avoiding unnecessary computation. This optimization becomes critical for high-dimensional problems where eigen decomposition dominates runtime.
4. Rayon vs Sequential
The choice to use Rayon for parallelization reflects the embarrassingly parallel nature of fitness evaluation. Unlike some algorithms where parallelization introduces complexity or synchronization overhead, CMA-ES naturally parallelizes: each candidate can be evaluated independently. Rayon provides a zero-cost abstraction that automatically scales to available CPU cores.
The trade-off involves slight thread pool overhead, but this is negligible compared to the linear speedup achieved on parallelizable objectives. For expensive objective functions that dominate runtime, parallelization can reduce wall-clock time by factors matching core count, making this an essential optimization for practical applications.
5. Deterministic vs Non-deterministic
Deterministic seeding by default prioritizes reproducibility over slight performance gains from faster random number generators. This choice recognizes that reproducibility is essential for debugging, regression testing, and scientific validation. The performance cost is negligible: modern RNGs are fast enough that seeding overhead doesn't impact optimization runtime.
The trade-off enables reproducible benchmarks, making it possible to verify that optimizations don't regress performance and that bug fixes actually resolve issues. This determinism is critical for CI/CD pipelines and scientific reproducibility, where identical inputs must produce identical outputs.
Production-ready β’ High-performance β’ Developer-friendly
Fastcma delivers a comprehensive optimization toolkit that balances ease of use with raw performance. The Python API provides multiple interfaces tailored to different use cases, from simple one-line optimizations to fine-grained control over the optimization process. Under the hood, a carefully optimized Rust core leverages modern CPU features like SIMD instructions and multi-core parallelism to achieve performance that rivals or exceeds hand-tuned C implementations.
The API design follows a progressive disclosure principle. New users can start with fmin for immediate results, while advanced users can leverage fmin_vec for batch processing, fmin_constrained for complex constraint handling, or the full CMAES class for complete control. The fmin_restart function implements proven IPOP and BIPOP strategies that automatically escape local minima by adaptively increasing population size.
| Function | Use Case | Complexity |
|---|---|---|
fmin |
Quick optimization with minimal setup | Low |
fmin_vec |
Batch evaluation for vectorized objectives | Medium |
fmin_constrained |
Box constraints, nonlinear constraints, custom feasibility | Medium |
fmin_restart |
Escaping local minima with adaptive restarts | Medium |
CMAES class |
Full ask-tell control for custom workflows | High |
Performance optimization follows a multi-layered strategy. SIMD acceleration targets the most frequent operations like dot products, achieving 3-4x speedups on modern CPUs. Rayon parallelization exploits the embarrassingly parallel nature of fitness evaluation, scaling linearly with core count. The lazy eigensystem update defers expensive
| Optimization | Technique | Speedup | When It Matters |
|---|---|---|---|
| SIMD | Vectorized dot products | 3-4x | High-dimensional problems |
| Rayon | Parallel fitness evaluation | Linear with cores | Expensive objectives |
| Lazy eigen | Deferred decomposition | 5-10x fewer ops | Dimensions > 20 |
| Pre-alloc buffers | Zero-allocation loops | Eliminates GC pauses | Low-latency apps |
The optimizer adapts to diverse problem characteristics through configurable modes and automatic adaptations. Users can choose between full covariance for capturing correlations or diagonal covariance for speed in high dimensions. Deterministic seeding enables reproducible research and debugging. Noise handling automatically detects and escapes stagnation caused by noisy evaluations. The constraint handling pipeline provides multiple fallback mechanisms, from simple box projection to sophisticated penalty methods.
Developer experience receives equal attention to performance. A pure-Python baseline (fastcma.cma_es) enables direct speed comparisons. The Rich TUI provides real-time visualization of optimization progress. Cross-platform wheels eliminate build complexity for end users. C/C++ FFI headers enable native integration, while the REST API example demonstrates remote optimization workflows.
Installation follows standard Python package management practices, with pre-built wheels available for most platforms. The PyPI package includes optimized binaries compiled for common architectures, eliminating the need for Rust toolchain installation for most users. For developers modifying the Rust code, source builds enable full control over compilation options and feature flags.
The simplest installation path uses PyPI's pre-built wheels, which work out of the box on Linux, macOS, and Windows for Python 3.9 through 3.12. The package installs as the fastcma module, providing immediate access to all optimization functions.
python -m pip install fast-cmaes==0.1.4 # installs module `fastcma`Source builds become necessary when modifying Rust code, enabling specific feature flags, or targeting platforms without pre-built wheels. The build process uses maturin, which handles the complexity of compiling Rust extensions and generating Python-compatible wheels. Optional features like NumPy support and LAPACK backends require additional dependencies but provide performance benefits for specific use cases.
Build locally (needed only if hacking on Rust):
python -m pip install maturin
maturin develop --release
# Optional: NumPy fast paths
maturin develop --release --features numpy_support
# Optional: LAPACK eigen backend (requires a Fortran toolchain)
maturin develop --release --features eigen_lapack
# Demo extras (Rich TUI)
python -m pip install .[demo]One-liner setup + Rich TUI demo (auto-installs nightly Rust, uv, venv, builds, runs):
./scripts/setup_and_demo.shfrom fastcma import fmin
from fastcma_baseline import benchmark_sphere
def sphere(x):
return sum(v*v for v in x)
xmin, es = fmin(sphere, [0.5, -0.2, 0.8], sigma=0.3, maxfevals=4000, ftarget=1e-12)
print("xmin", xmin)
# Pure-Python baseline
print(benchmark_sphere(dim=20, iters=120))from fastcma import fmin_vec
def sphere_vec(X):
return [sum(v*v for v in x) for x in X]
xmin, _ = fmin_vec(sphere_vec, [0.4, -0.1, 0.3], sigma=0.25, maxfevals=3000)from fastcma import fmin_constrained
def sphere(x):
return sum(v*v for v in x)
constraints = {"lower_bounds": [-1, -1, -1], "upper_bounds": [1, 1, 1]}
xmin, _ = fmin_constrained(sphere, [0.5, 0.5, 0.5], 0.3, constraints)use fastcma::{optimize_rust, CovarianceModeKind};
let (xmin, _state) = optimize_rust(
vec![0.5, -0.2, 0.8],
0.3,
None,
Some(4000),
Some(1e-12),
CovarianceModeKind::Full,
|x| x.iter().map(|v| v*v).sum()
);
println!("xmin = {:?}", xmin);Lightweight ABI for embedding is in include/fastcma.h and include/fastcma.hpp (version helper + sphere demo).
cargo build --release#include "fastcma.h"
#include <stdio.h>
int main() {
double xmin[4];
double f = fastcma_sphere(4, 0.4, 20000, 42ULL, xmin);
printf("f=%g x0=%g %g %g %g\n", f, xmin[0], xmin[1], xmin[2], xmin[3]);
}Compile (adjust path/extension per platform) β link against Python because the library is built with PyO3:
gcc demo.c -I include -L target/release -lfastcma $(python3-config --embed --ldflags)Run (ensure the library path is visible):
LD_LIBRARY_PATH=target/release ./a.outexamples/api_server.py exposes a minimal FastAPI service wrapping fastcma.fmin for remote optimization jobs.
uv pip install "fastapi[standard]" fast-cmaes
uvicorn examples.api_server:app --reloadPOST /optimize with JSON body:
{
"x0": [0.5, 0.5],
"sigma": 0.4,
"maxfevals": 5000
}Returns xmin, fbest, and evaluation counts.
- π
examples/python_quickstart.pyβ Minimal sphere + vectorized demo - π
examples/python_benchmarks.pyβ Rust vs naive Python on sphere; naive on Rastrigin - π¨
examples/rich_tui_demo.pyβ Rich TUI streaming sigma/fbest/evals while minimizing Rosenbrock - π
examples/api_server.pyβ FastAPI microservice for remote optimize calls
One-shot setup + demo runner:
./scripts/setup_and_demo.shWhat it does: Ensures nightly Rust, creates a uv venv on Python 3.13, installs maturin + demo extras, builds the extension, and launches the Rich TUI.
Run the TUI with uv + Python 3.13:
uv venv --python 3.13
uv pip install .[demo]
uv run python examples/rich_tui_demo.py- π Pure Python baseline:
fastcma_baseline.cma_es,fastcma_baseline.benchmark_sphere(seepython/fastcma_baseline/naive_cma.py) - β
Integration benchmarks (fixed seeds): sphere, Rosenbrock, Rastrigin, Ackley, Schwefel, Griewank in
tests/benchmarks.rs - π₯ Hard-suite (20 classic tough functions, seeded, higher dims): Zakharov, Levy, Dixon-Price, Powell, StyblinskiβTang, Bohachevsky, Bukin6, Dropwave, Alpine N.1, Elliptic, Salomon, Quartic, Schwefel 1.2/2.22, Bent Cigar, Rastrigin 10D, Ackley 10D, Griewank 10D, Rosenbrock 6D, Sum Squares 12D in
tests/hard_benchmarks.rs - π Very-hard (ignored by default, ~20 more high-dim/fractal/ill-conditioned cases such as Katsuura, Weierstrass, Schwefel 2.26 @ 30D, HappyCat/HGBat, Expanded Schaffer F6/F7, Discus, Different Powers, 30D Rastrigin/Ackley/Rosenbrock/Griewank/Elliptic) in the same file; run with
cargo test --test hard_benchmarks -- --ignored - π¨ Rich TUI demo for live insight
Performance optimization in fastcma follows a philosophy of targeted improvements rather than premature optimization. Each optimization targets a specific bottleneck identified through profiling and analysis. The result is a system that achieves substantial speedups without sacrificing code clarity or maintainability. The optimizations work synergistically: SIMD accelerates the hot path, lazy updates reduce expensive operations, and parallelization exploits modern multi-core architectures.
1. SIMD Acceleration
- Implementation: Uses Rust's
portable_simdwith 4 f64 lanes (256-bit AVX on x86_64) - Impact: ~3-4x speedup on dot products and vector operations
- Usage: Automatically applied to
dot_simd(),square_sum_simd(), and constraint clamping - When it matters: High-dimensional problems where dot products dominate
2. Rayon Parallelization
- Fitness evaluations:
par_iter()parallelizes objective calls across CPU cores - Covariance updates:
par_chunks_mut()parallelizes matrix operations - Impact: Near-linear speedup (e.g., 8 cores β ~7x faster) on parallelizable objectives
- Best for: Expensive objective functions that can be evaluated independently
3. Lazy Eigensystem Updates
-
Strategy: Defer expensive
$$O(n^{3})$$ eigen decomposition until necessary -
Gap calculation:
lazy_gap_evals = 0.5 * n * Ξ» / ((c_1 + c_ΞΌ) * nΒ²) - Impact: Reduces eigen decompositions by 5-10x in typical runs
-
Critical for: High-dimensional problems (
$$n > 20$$ ) where eigen decomposition dominates
4. Diagonal Covariance Mode
- When to use: High dimensions (n > 50) or separable problems
- Trade-off: Faster (
O(n)vsO(n^3)) but assumes independent variables - Speedup: 10-100x faster covariance updates, but may converge slower on correlated problems
- Access: Set
covariance_mode="diagonal"in Python API
5. Pre-allocated Buffers
- Buffers:
z_buf,y_buf,tmp_bufreused across iterations - Impact: Zero-allocation inner loops, critical for low-latency applications
- Memory: Fixed
O(n)overhead regardless of population size
6. IPOP/BIPOP Restarts
- IPOP: Increasing population size (
Ξ» = base_Ξ» * 2^restart) - BIPOP: Alternates between large (IPOP) and small populations
- Parallel helper:
test_utils::run_ipop_bipop_parallelruns multiple populations concurrently - Python API:
fmin_restart()defaults to BIPOP strategy - Use case: Escaping local minima without huge evaluation budgets
7. Noise Handling
- Detection: Monitors fitness spread relative to best value
- Response: Expands
sigmaby factor (default 1.6) when spread < threshold (default 1e-3) - Cooldown: Prevents oscillation with configurable cooldown period (default 5 iterations)
- Python API: Enable with
noise=Trueinfmin,fmin_vec, orfmin_constrained - Rust helper:
test_utils::run_seeded_mode_noisefor deterministic testing
8. Constraint Handling
- Box projection: SIMD-accelerated clamping to
[lb, ub]bounds - Mirroring: Optional reflection of out-of-bounds points into feasible region
- Rejection/resampling: Up to
max_resamplesattempts ifreject()predicate fails - Repair: Optional user-provided function to fix infeasible points
- Penalty: Optional penalty function (includes augmented-Lagrangian helper in
test_utils) - Flexibility: Supports box constraints, nonlinear constraints, and custom feasibility rules
9. Deterministic Seeding
- Implementation:
new_with_seed()usesStdRng::seed_from_u64() - Impact: Same seed β identical optimization trajectory
- Critical for: Reproducible benchmarks, regression testing, and debugging
- Test helpers:
test_utils::run_seeded,test_utils::run_multiseedfor deterministic runs
10. Restart Helpers
- Simple restarts:
test_utils::run_with_restartsperturbs start point between runs - Parallel populations:
test_utils::run_ipop_bipop_parallelmaintains multiple concurrent populations - Use case: Escaping local minima without huge evaluation budgets
- π¦
numpy_support: NumPy array support in vectorized objectives - π’
eigen_lapack: LAPACK eigen backend - π§ͺ
test_utils: Expose deterministic helpers externally - π¨
demo: Pulls inrichfor the TUI
The setup script embodies the project's commitment to user experience. Rather than requiring users to navigate complex build requirements manually, a single command handles everything from Rust toolchain installation to running the demo. The script follows a fail-fast philosophy, checking prerequisites early and providing clear error messages when requirements aren't met.
The automation covers the complete workflow: verifying Rust nightly (required for SIMD features), installing the modern uv package manager, creating an isolated Python 3.13 virtual environment, installing build dependencies, compiling the Rust extension, and finally launching the Rich TUI demo. This eliminates the common friction of "getting started" that can make the project difficult to use.
Usage:
./scripts/setup_and_demo.shThe script is idempotent and handles partial installations gracefully, making it safe to run multiple times. It's designed for users who want to explore the codebase or run a quick local demo without manual setup steps.
The test suite follows a pyramid structure, with fast unit tests forming the base and comprehensive integration tests providing confidence at higher levels. This organization enables rapid feedback during development while ensuring thorough validation before releases. Each test level serves a specific purpose: unit tests catch regressions immediately, integration benchmarks validate correctness on standard problems, and hard suites stress-test robustness under extreme conditions.
The suite's design philosophy emphasizes deterministic testing through seeded random number generation. This enables reproducible test runs, making it possible to debug failures by reproducing exact optimization trajectories. The multi-seed approach validates that good performance isn't due to fortunate random initialization, while parallel test execution keeps total runtime manageable despite comprehensive coverage.
The test suite is organized into multiple levels of difficulty:
flowchart LR
Start([Tests]) --> Core[Core 5-10s]
Start --> Integration[Integration 10-15s]
Start --> Hard[Hard ~90s]
Start --> VeryHard[Very-Hard 5-10min]
Start --> Python[Python API]
Start --> FFI[C/C++ FFI]
Start --> Stability[Stability]
classDef coreStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef intStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef hardStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
classDef veryHardStyle fill:#f8bbd0,stroke:#f48fb1,stroke-width:1px,color:#333
classDef otherStyle fill:#e1bee7,stroke:#ce93d8,stroke-width:1px,color:#333
class Core coreStyle
class Integration intStyle
class Hard hardStyle
class VeryHard veryHardStyle
class Python,FFI,Stability otherStyle
Test Category Comparison:
| Category | Functions | Runtime | Trigger | Purpose |
|---|---|---|---|---|
| Core Unit | 6 basic tests | 5-10s | Every commit | Regression detection |
| Integration | 6 classic functions | 10-15s | Every commit | Correctness validation |
| Hard Suite | 20 challenging functions | ~90s | Every commit | Stress testing |
| Very-Hard Lite | 3 extreme cases | ~30s | Nightly | Quick robustness check |
| Very-Hard Full | ~20 pathological cases | 5-10min | Weekly | Comprehensive validation |
| Python Integration | API + roundtrip | ~10s | Every commit | Python API correctness |
| FFI | C/C++ bindings | ~5s | Every commit | Native integration |
1. Core Unit Tests (cargo test)
These tests form the foundation of the test pyramid, providing rapid feedback on basic functionality. They validate core operations like sphere and Rosenbrock convergence, ensuring fundamental algorithm correctness. Additional tests cover specialized features like noise handling, constraint satisfaction through augmented Lagrangian methods, and restart mechanisms. The short runtime enables running these tests frequently during development, catching regressions before they propagate.
2. Integration Benchmarks (tests/benchmarks.rs)
Integration benchmarks validate CMA-ES correctness on six classic optimization functions that represent different problem characteristics. These tests go beyond simple convergence, exercising features like deterministic seeding, multi-seed runs that validate robustness across initializations, and parallel restart strategies. The functions span from simple convex landscapes to highly multimodal problems, ensuring the optimizer handles diverse scenarios correctly.
3. Hard Suite (tests/hard_benchmarks.rs)
The hard suite expands testing to twenty functions that stress-test the optimizer under challenging but realistic conditions. These functions introduce complications like high-order interactions, fractal landscapes, extreme condition numbers, and non-separability. Unlike unit tests that validate correctness, the hard suite validates robustness when problem characteristics deviate from ideal conditions. Functions range from low-dimensional but tricky cases like Bukin6 to high-dimensional variants of classic functions.
4. Very-Hard Suite (tests/hard_benchmarks.rs -- --ignored)
The very-hard suite represents the extreme end of testing, with approximately twenty pathological cases designed to probe algorithm limits. These include fractal functions like Katsuura and Weierstrass, high-dimensional variants up to 30D, and ill-conditioned problems that challenge numerical stability. The suite runs weekly rather than on every commit due to its computational cost, but provides essential validation that the optimizer remains robust under extreme conditions.
5. Very-Hard Lite (tests/very_hard_lite.rs)
Very-hard lite provides a compromise between thoroughness and speed, running a curated subset of three representative very-hard cases nightly. This enables regular validation of robustness without the full computational cost, catching regressions in extreme-case handling between weekly full suite runs.
6. Python Integration Tests
tests/python_smoke.py: Basic Python API smoke testtests/test_api_smoke.py: FastAPI integration testtests/test_roundtrip.py: Serialization/roundtrip validationtests/python_smoke_pypi.py: Post-PyPI-install validation- Run:
pytest tests/
7. FFI Tests (tests/ffi_smoke/)
- C FFI validation: Compiles and runs C demo program
- Purpose: Ensure C/C++ bindings work correctly
- Run:
cargo build --release && gcc tests/ffi_smoke/demo.c ...
8. Stability Tests (tests/stability.rs)
- Long-running stability checks: Prevents regressions
- Purpose: Catch numerical instabilities over many iterations
The test suite employs a carefully curated collection of optimization functions that probe different aspects of algorithm performance. These functions span from simple convex landscapes to pathological cases with fractal structure, ensuring the optimizer handles both common scenarios and edge cases. Each function targets specific challenges: multimodality tests global search capability, ill-conditioning tests numerical stability, and non-separability tests correlation learning.
The selection follows established benchmarking practices from the optimization literature, ensuring comparability with other CMA-ES implementations and evolutionary algorithms. Functions are grouped by difficulty, with classic benchmarks providing baseline validation and harder suites stress-testing robustness under extreme conditions.
These six functions form the foundation of optimization benchmarking, each exposing different algorithm characteristics. They range from the simplest convex function (Sphere) to highly multimodal landscapes (Rastrigin, Ackley) that challenge global search capabilities.
| Function | Challenge Type | Key Difficulty | Typical Dimension |
|---|---|---|---|
| Sphere | Convex, unimodal | Numerical precision | 6-20 |
| Rosenbrock | Narrow valley | Curved search path | 2-6 |
| Rastrigin | Highly multimodal | Many local minima | 4-10 |
| Ackley | Multimodal, oscillatory | Narrow global basin | 4-10 |
| Griewank | Non-separable | Product interactions | 5-10 |
| Schwefel | Deceptive | Local optima far from global | 3-10 |
Sphere
What this means: Just the sum of squares of all variables - the simplest possible optimization problem. The minimum is at the origin where all variables are zero.
Why it matters: This is the "hello world" of optimization. Every optimizer should solve this quickly. It's like testing if your calculator can add - if you can't optimize sphere, you have bigger problems.
Rosenbrock
What this means: Creates a narrow, curved valley where the optimum is at (1,1,1,...). The
Why it matters: Tests if an optimizer can follow curved paths. Gradient methods get stuck because the gradient points steeply uphill initially. CMA-ES excels here because it learns the valley's shape.
Rastrigin
What this means: Combines a sphere-like quadratic term with cosine waves that create many local minima. The cosine creates ripples around the origin, with the number of local optima growing exponentially with dimension.
Why it matters: Tests global optimization ability. Simple hill-climbers get trapped in local minima. CMA-ES can escape because its population-based search explores multiple regions simultaneously.
Ackley
What this means: Has an exponential bowl shape (first term) with cosine ripples (second term) that create many local minima. The global minimum is a small, narrow basin at the origin.
Why it matters: Combines global structure (the bowl) with local complexity (the ripples). Tests if an optimizer can find the global basin while navigating around many distracting local minima.
Griewank
- History: Named after Andreas Griewank (1981), tests product term interactions
- Why hard: Product term creates many local minima; becomes easier in high dimensions (counterintuitive)
- Challenge: Tests handling of non-separable interactions
-
Optimum:
$$\mathbf{x}^* = \mathbf{0}$$ ,$$f^* = 0$$
Schwefel
- History: Named after Hans-Paul Schwefel, designed to test robustness
- Why hard: Many local minima far from global optimum; deceptive landscape
- Challenge: Tests ability to avoid premature convergence
-
Optimum:
$$\mathbf{x}^* \approx 420.9687$$ ,$$f^* \approx 0$$
The hard suite expands testing to twenty functions that stress-test the optimizer under diverse challenging conditions. These functions introduce complications like high-order interactions, fractal landscapes, extreme condition numbers, and non-separability. Unlike the classic benchmarks which primarily test convergence on well-behaved problems, the hard suite validates robustness when problem characteristics deviate from ideal conditions.
| Function | Primary Challenge | Condition Number | Separability |
|---|---|---|---|
| Zakharov | High-order terms (quartic) | Moderate | Non-separable |
| Levy | Fractal-like structure | Moderate | Non-separable |
| Dixon-Price | Variable coupling | Moderate | Non-separable |
| Powell | Quartic narrow valleys | High | Non-separable |
| Styblinski-Tang | Multiple local minima | Moderate | Separable |
| Bent Cigar | Extreme ill-conditioning | Separable | |
| Elliptic | Exponentially increasing |
|
Separable |
| Schwefel 1.2 | Quadratic interactions | Moderate | Non-separable |
| Schwefel 2.22 | Non-differentiable | Low | Separable |
Zakharov
- Challenge: Non-separable, quartic terms create steep valleys
- Why hard: Tests handling of high-order interactions
Levy
where
- Challenge: Highly multimodal with many narrow valleys
- Why hard: Tests convergence in fractal-like landscapes
Dixon-Price
- Challenge: Non-separable, quadratic coupling between variables
- Why hard: Tests handling of variable dependencies
Powell
- Challenge: Quartic terms create narrow valleys
- Why hard: Tests convergence in ill-conditioned landscapes
Styblinski-Tang
- Challenge: Multiple local minima, non-convex
- Why hard: Tests global optimization capability
Bent Cigar
What this means: The first variable is weighted normally, but all other variables are weighted by a million. It's like a cigar shape - very narrow along most dimensions but normal along the first.
Why it matters: Tests numerical conditioning. Most optimizers need tiny steps in the high-weight directions but normal steps in the low-weight direction. Poor conditioning makes many algorithms unstable.
Elliptic
- Challenge: Exponentially increasing condition number
- Why hard: Tests scalability to high dimensions
Schwefel 1.2
- Challenge: Non-separable, quadratic interactions
- Why hard: Tests handling of variable coupling
Schwefel 2.22
- Challenge: Non-differentiable at origin, product term
- Why hard: Tests robustness to non-smooth landscapes
Katsuura
- Challenge: Fractal structure, extremely multimodal
- Why hard: Tests convergence in pathological landscapes
Weierstrass
where
- Challenge: Fractal function, nowhere differentiable
- Why hard: Tests robustness to non-smooth landscapes
HappyCat/HGBat: Complex non-separable functions with multiple terms
- Challenge: Tests handling of intricate variable interactions
Expanded Schaffer F6/F7: Composed functions with oscillatory behavior
- Challenge: Tests convergence in oscillatory landscapes
Discus: Extreme condition number variant
- Challenge: Tests numerical stability
Different Powers: Variable powers create ill-conditioning
- Challenge: Tests handling of heterogeneous scaling
The CI/CD pipeline represents a carefully orchestrated balance between thoroughness and efficiency. Rather than running all tests on every commit, the pipeline uses a tiered approach: fast tests run on every push for immediate feedback, while expensive tests run on schedules or manual triggers. This strategy keeps pull request feedback times reasonable while ensuring comprehensive validation through scheduled runs.
The pipeline's matrix strategy builds wheels for multiple operating systems and Python versions, ensuring compatibility across the supported ecosystem. Automated PyPI publishing on version tags eliminates manual deployment steps, while post-publish smoke tests validate that published packages function correctly. The inclusion of memory safety tools like Miri and address sanitizers demonstrates commitment to reliability beyond functional correctness.
The project uses GitHub Actions for comprehensive CI/CD automation. Workflow file: .github/workflows/build-wheels.yml
flowchart TD
Trigger[Trigger] --> Test[Test]
Trigger --> Build[Build]
Test --> VeryHardLite{Nightly}
Test --> VeryHard{Weekly}
Test --> PythonInt[Python]
Test --> FFIInt[FFI]
Test --> Demo[Demo]
Test --> Miri{Weekly}
Test --> ASAN{Weekly}
Build --> BuildWheels[Wheels]
BuildWheels --> UploadWheels[Upload]
UploadWheels --> TagCheck{Tag?}
TagCheck -->|Yes| Publish[PyPI]
TagCheck -->|No| End1([End])
Publish --> PyPISmoke[Smoke]
PyPISmoke --> End2([End])
VeryHardLite --> End1
VeryHard --> End1
PythonInt --> End1
FFIInt --> End1
Demo --> End1
Miri --> End1
ASAN --> End1
classDef testStyle fill:#b3e5d1,stroke:#7fb8a3,stroke-width:1px,color:#333
classDef buildStyle fill:#b3d9ff,stroke:#7fb3d3,stroke-width:1px,color:#333
classDef publishStyle fill:#ffe0b3,stroke:#ffcc80,stroke-width:1px,color:#333
classDef decisionStyle fill:#e1bee7,stroke:#ce93d8,stroke-width:1px,color:#333
class Test,VeryHardLite,VeryHard,PythonInt,FFIInt,Demo,Miri,ASAN testStyle
class Build,BuildWheels,UploadWheels buildStyle
class Publish,PyPISmoke publishStyle
class TagCheck decisionStyle
1. test (Runs on every push/PR)
- Purpose: Core functionality validation
- Runs:
cargo test --features "numpy_support test_utils" - Benchmarks: Captures timing logs for performance regression detection
- Artifacts: Uploads benchmark timings
2. very-hard-lite (Nightly + manual dispatch)
- Purpose: Quick validation of very-hard suite
- Runs:
cargo test --test very_hard_lite - Schedule: Daily at 5:30 AM UTC
- Timeout: 30 minutes
3. very-hard (Weekly + manual dispatch)
- Purpose: Full very-hard suite validation
- Runs:
cargo test --test hard_benchmarks -- --ignored - Schedule: Weekly on Sunday at 6:00 AM UTC
- Timeout: 60 minutes
- Note: Only runs on GitHub compute (expensive)
4. integration-python (Matrix: Python 3.12, 3.13)
- Purpose: Python API validation
- Runs:
pytest tests/test_api_smoke.py tests/test_roundtrip.py - Tests: FastAPI integration, roundtrip serialization
5. integration-ffi
- Purpose: C/C++ FFI validation
- Builds:
cargo build --release - Compiles: C demo program with FFI
- Runs: FFI smoke test binary
6. miri (Weekly + manual dispatch)
- Purpose: Undefined behavior detection
- Runs:
cargo miri test --lib - Schedule: Weekly on Sunday
- Note: Catches memory safety issues
7. asan (Weekly + manual dispatch)
- Purpose: Address sanitizer for memory errors
- Runs:
cargo test -Zbuild-std --target x86_64-unknown-linux-gnu --lib - Schedule: Weekly on Sunday
- Note: Catches use-after-free, buffer overflows
8. demo
- Purpose: Validates setup script and Rich TUI demo
- Runs:
./scripts/setup_and_demo.sh - Tests: End-to-end user experience
9. build (Matrix: 3 OS Γ 3 Python versions = 9 builds)
- OS: Ubuntu, Windows, macOS
- Python: 3.12, 3.13, 3.14
- Builds: Wheels with
maturin build --release - Tests: Installs wheel and runs Python smoke tests
- Artifacts: Uploads wheels for all platforms
10. publish (Triggers on git tags v*)
- Purpose: Publish wheels to PyPI
- Requires:
PYPI_API_TOKENsecret - Downloads: All wheels from build artifacts
- Publishes:
twine upload wheels/*.whl
11. pypi-smoke (After publish)
- Purpose: Validates PyPI installation
- Installs:
pip install fast-cmaesfrom PyPI - Runs:
tests/python_smoke_pypi.py - Validates: Package integrity after publish
- Every push/PR: Core tests, Python integration, FFI, demo, wheel builds
- Daily (5:30 AM UTC): Very-hard lite suite
- Weekly (Sunday 6:00 AM UTC): Full very-hard suite, Miri, ASAN
- Timing logs: Captured for benchmarks and uploaded as artifacts
- Regression detection: Timing comparisons across runs
- Performance tracking: Monitor optimization speed over time
MIT License (with OpenAI/Anthropic Rider). (c) 2025 Jeffrey Emanuel