Efficient way to increment on-disk merged index

Hello,

Great library. Had two questions I couldn't find answers to in the past issues / READMEs

1. Is it possible to add additional shards to an already pre-built index that was created from a set of shards (`faiss.contrib.ondisk.merge_ondisk()`), or does one need to run `merge_ondisk` again? I'm dealing with very large indices...
2. In a distributed setting, indexing shards requires knowing the global ID offset. (https://github.com/facebookresearch/faiss/blob/main/demos/demo_ondisk_ivf.py#L53). Is there an efficient way to merge index shards that always start from zero? For example, I may have 10 machines that create an IVF index shard of 100 elements from 0-99. In the merge step, can I efficiently (or not) "on-the-fly" offset the indices of each such that they are globally consistent, perhaps based on the order of the list provided to `merge_ondisk`?

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient way to increment on-disk merged index #2876

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficient way to increment on-disk merged index #2876

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions