Skip to content

Efficient way to increment on-disk merged index #2876

@fferroni

Description

@fferroni

Hello,

Great library. Had two questions I couldn't find answers to in the past issues / READMEs

  1. Is it possible to add additional shards to an already pre-built index that was created from a set of shards (faiss.contrib.ondisk.merge_ondisk()), or does one need to run merge_ondisk again? I'm dealing with very large indices...
  2. In a distributed setting, indexing shards requires knowing the global ID offset. (https://github.com/facebookresearch/faiss/blob/main/demos/demo_ondisk_ivf.py#L53). Is there an efficient way to merge index shards that always start from zero? For example, I may have 10 machines that create an IVF index shard of 100 elements from 0-99. In the merge step, can I efficiently (or not) "on-the-fly" offset the indices of each such that they are globally consistent, perhaps based on the order of the list provided to merge_ondisk?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions