Cold-data loading optimization

Hi, after importing 200 M vectors into 15 shards the cold-start is painfully slow. We turned on binary quantization and keep the HNSW index in RAM, while the raw vectors stay on disk. Whenever we migrate to new hosts and restart the cluster, the startup takes up to one hour. My first guess is that the code reads the shard index files sequentially: it first loads the quantized index and only then the HNSW index. With this data size the serial load becomes a bottleneck; in an emergency the long recovery time can break our SLA.
I’d like to prefetch all required files concurrently before the service starts, so later index access will hit the OS cache. 
Users can set a concurrency parameter to match the maximum IOPS their disks can deliver. It’s admittedly ugly, but it’s the quickest way to buy us time. In the long run we still need to make the index-loading code itself parallel.
Below are the startup logs:
2026-01-23T02:38:44.426125Z  INFO storage::content_manager::consensus::persistent: Loading raft state from ./storage/raft_state.json
2026-01-23T02:38:44.435406Z  INFO storage::content_manager::toc: Loading collection: t_scalar
2026-01-23T03:24:14.428535Z  INFO collection::shards::local_shard: Recovering shard ./storage/collections/t_scalar/1: 0/1 (0%)
2026-01-23T03:24:14.487988Z  INFO collection::shards::local_shard: Recovered collection t_scalar: 1/1 (100%)
2026-01-23T03:26:30.083950Z  INFO collection::shards::local_shard: Recovering shard ./storage/collections/t_scalar/3: 0/1 (0%)
2026-01-23T03:26:30.190590Z  INFO collection::shards::local_shard: Recovered collection t_scalar: 1/1 (100%)
2026-01-23T03:28:48.091480Z  INFO collection::shards::local_shard: Recovering shard ./storage/collections/t_scalar/5: 0/1 (0%)
2026-01-23T03:28:48.201508Z  INFO collection::shards::local_shard: Recovered collection t_scalar: 1/1 (100%)
2026-01-23T03:30:50.456139Z  INFO collection::shards::local_shard: Recovering shard ./storage/collections/t_scalar/7: 0/1 (0%)
2026-01-23T03:30:50.568487Z  INFO collection::shards::local_shard: Recovered collection t_scalar: 1/1 (100%)

We’re running on AWS r7g.8xlarge (32 vCPU, 256 GiB) with gp3 volumes configured at 16 000 IOPS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cold-data loading optimization #7975

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cold-data loading optimization #7975

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions