This document provides a high-level introduction to VectorChord, a PostgreSQL extension for scalable vector similarity search. It covers the system's purpose, licensing model, architectural organization, and how the major components integrate with PostgreSQL.
For installation and getting started with VectorChord, see Quick Start Guide. For detailed information about specific features and capabilities, see Features and Capabilities. For architectural details, see Architecture.
VectorChord (also called vchord) is a PostgreSQL extension that implements high-performance vector similarity search with disk-efficient storage. It is designed as a successor to pgvecto.rs, offering improved stability and performance while maintaining compatibility with pgvector's data types and SQL interface.
The extension provides two primary index access methods:
VectorChord supports vectors up to 60,000 dimensions and provides SIMD-optimized distance calculations across multiple CPU architectures (x86_64, AArch64, s390x, PowerPC64, RISC-V).
Sources: README.md29-60 src/lib.rs23-44
VectorChord is distributed under a dual license model, allowing users to choose between two options:
| License | Description | Use Case |
|---|---|---|
| GNU Affero General Public License v3 (AGPLv3) | Open source license requiring source code distribution for modified versions, including when used over a network | Open source projects, internal use |
| Elastic License v2 (ELv2) | Proprietary-friendly license with specific restrictions on providing the software as a managed service | Commercial products, SaaS applications |
Users may select either license based on their deployment requirements. Commercial collaboration and licensing inquiries should be directed to [email protected].
Sources: README.md123-131 src/lib.rs1-13
VectorChord is implemented as a Rust workspace with multiple internal crates, compiled into a PostgreSQL extension using the pgrx framework. The following diagram shows the primary code organization:
Extension Entry Point and Initialization
The main extension is defined in src/lib.rs23-63 which contains:
pg_module_magic! macro declaring the extension name and version_PG_init function executed when PostgreSQL loads the extension via shared_preload_librariesextension_sql_file!Sources: src/lib.rs1-82 Cargo.toml1-122
The codebase is organized as a Cargo workspace with the following internal crates:
| Crate | Path | Purpose |
|---|---|---|
vchord | src/ | Main extension entry point, SQL interface |
vchordrq | crates/vchordrq/ | Inverted file index implementation |
vchordg | crates/vchordg/ | Graph index implementation |
index | crates/index/ | Shared index algorithms and utilities |
k_means | crates/k_means/ | K-Means clustering (flat, quick, rabitq, hierarchical) |
simd | crates/simd/ | SIMD optimizations and multiversion dispatch |
rabitq | crates/rabitq/ | RaBitQ quantization implementation |
vector | crates/vector/ | Vector type definitions |
distance | crates/distance/ | Distance metric implementations |
feistel | crates/feistel/ | Feistel cipher for CTID generation |
always_equal | crates/always_equal/ | Utility types |
small_iter | crates/small_iter/ | Small vector iterators |
Sources: Cargo.toml24-35 Cargo.toml61-63
VectorChord must be loaded via PostgreSQL's shared_preload_libraries configuration parameter. The extension validates this requirement in _PG_init:
The initialization process:
index::init() to register access method handlers for vchordrq and vchordgrecorder::init() to set up the query sampling background workervchord GUC prefix with PostgreSQL (PG15+) or emits placeholder warnings (PG13-14)Sources: src/lib.rs48-63
VectorChord implements PostgreSQL's Index Access Method (AM) interface for both index types. Each access method provides function pointers for operations like index build, scan, insert, and delete:
Sources: src/index/1-100 (specific line numbers depend on index module implementation)
VectorChord provides two index types with different performance characteristics:
vchordrq (Inverted File + RaBitQ)
vchordg (HNSW-like Graph)
Sources: README.md96-106
The SIMD subsystem (crates/simd/) implements runtime CPU feature detection and multiversion dispatch to select optimal implementations:
The multiversion macro system generates dispatcher functions at compile time, enabling the codebase to ship a single binary that automatically selects the best implementation for the runtime CPU. Detailed architecture: SIMD Optimization System
Sources: crates/simd/Cargo.toml1-34
The k_means crate provides multiple clustering algorithms for index construction:
| Algorithm | Use Case | Spherical Support |
|---|---|---|
| flat | General purpose, balanced accuracy/speed | Yes |
| quick | Fast clustering for smaller datasets | Yes |
| rabitq | Clustering optimized for quantization | Yes |
| hierarchical | Two-level clustering for large datasets | Yes |
Spherical K-Means is used for cosine and dot product metrics, where centroids are normalized to unit length. Detailed documentation: Build Process and K-Means Clustering
Sources: Implied from high-level diagrams and README
VectorChord includes Python scripts for external index precomputation and benchmarking:
This pipeline enables users to leverage GPU acceleration for K-Means training on large datasets (>5M vectors) before importing the precomputed centroids into PostgreSQL. Detailed usage: External Data Pipeline
Sources: scripts/README.md1-59 scripts/dump.py1-82 scripts/train.py1-219 scripts/index.py1-248 scripts/bench.py1-267
VectorChord extends pgvector's type system with quantized representations:
| Type | Size per element | Use Case |
|---|---|---|
vector | 4 bytes (f32) | Original pgvector type, full precision |
halfvec | 2 bytes (f16) | Half-precision, reduced memory |
scalar8 | 1 byte (8-bit scalar quantization) | Memory-efficient storage |
rabitq8 | 1 byte (8-bit RaBitQ) | Optimized for residual quantization |
rabitq4 | 0.5 bytes (4-bit RaBitQ) | Maximum compression |
Three primary distance operators are supported:
| Operator | Metric | SQL Symbol |
|---|---|---|
| L2 (Euclidean) | vector_l2_ops | <-> |
| Cosine | vector_cosine_ops | <=> |
| Inner Product (Dot) | vector_ip_ops | <#> |
The system includes a dispatch layer that routes operations based on VectorKind × DistanceKind combinations, selecting appropriate SIMD implementations. Detailed documentation: Type Dispatch System and Distance Metrics and Operators
Sources: src/datatype/1-100 (implied from structure)
VectorChord exposes PostgreSQL GUC (Grand Unified Configuration) parameters for runtime tuning:
vchordrq parameters:
vchordrq.probes: Number of lists to probe during searchvchordrq.epsilon: Reranking expansion factorvchordrq.io_search: I/O prefetching strategy (plain/simple/stream)vchordg parameters:
vchordg.ef_search: Size of dynamic candidate list during searchConfiguration reference: Configuration Parameters (GUCs)
The recorder subsystem implements background query sampling for recall evaluation:
The recorder worker runs as a PostgreSQL background worker, periodically sampling queries and storing them in an SQLite database for later recall evaluation. Detailed documentation: Query Sampling and Recall Evaluation
Sources: src/recorder/1-100 (implied from initialization)
VectorChord uses Cargo feature flags for PostgreSQL version compatibility:
| Feature | PostgreSQL Version |
|---|---|
pg13 | PostgreSQL 13 |
pg14 | PostgreSQL 14 |
pg15 | PostgreSQL 15 |
pg16 | PostgreSQL 16 |
pg17 | PostgreSQL 17 |
pg18 | PostgreSQL 18 |
The build process integrates with pgrx to generate PostgreSQL extension files. For SIMD operations, a build.rs script compiles C shims for architectures requiring special intrinsics (FP16, SVE). Detailed documentation: Build System Architecture
Sources: Cargo.toml15-22 crates/simd/Cargo.toml28-30
On Linux and macOS with x86_64 or AArch64, VectorChord uses mimalloc as the global allocator for improved memory management performance:
Sources: src/lib.rs77-81 Cargo.toml55-56
The extension version is embedded at compile time via the VCHORD_VERSION environment variable. The upgrade module (src/upgrade/) handles schema migrations between versions when users run ALTER EXTENSION vchord UPDATE.
Sources: src/lib.rs23-44
This overview provides the foundation for understanding VectorChord's architecture. For detailed information about specific subsystems, refer to the linked sections throughout this document or explore the Architecture section (Architecture) for a deeper dive into component interactions.
Refresh this wiki
This wiki was recently refreshed. Please wait 7 days to refresh again.