Skip to content

[Bug] Integer Overflow in NSG Index Construction Causes vector::_M_default_append Error with Large Datasets #4295

@andylizf

Description

@andylizf

Description

When constructing an NSG index with large datasets (60M+ vectors) and M=64, the construction fails with a vector::_M_default_append error. The root cause is an integer overflow when two signed integers (ntotal and M) are multiplied, and the overflowed result is implicitly converted to uint64_t for vector resizing.

To Reproduce

Create an NSG index with:

  • Dataset size: ~60M vectors (60,450,220 x 768 dimensional vectors)
  • M (graph degree): 64

Error Message

Iter: 5, recall@64: 0.106563
Iter: 6, recall@64: 0.288906
Iter: 7, recall@64: 0.512656
Iter: 8, recall@64: 0.691719
Iter: 9, recall@64: 0.799688
Traceback (most recent call last):
  File "/home/ubuntu/Power-RAG/./demo/build_nsg.py", line 192, in <module>
    index_nsg.add(embeddings)
  File "/home/ubuntu/Power-RAG/.venv/lib/python3.10/site-packages/faiss/class_wrappers.py", line 230, in replacement_add
    self.add_c(n, swig_ptr(x))
  File "/home/ubuntu/Power-RAG/.venv/lib/python3.10/site-packages/faiss/swigfaiss_avx512.py", line 7304, in add
    return *swigfaiss*avx512.IndexNSG_add(self, n, x)
RuntimeError: C++ exception vector::_M_default_append
Error details: C++ exception vector::_M_default_append
Embeddings shape: (60450220, 768)

Root Cause Analysis

Two signed integers (ntotal ≈ 60M and M = 64) are multiplied, causing integer overflow. When this overflowed value is implicitly converted to size_t, or uint64_t for vector resizing, it becomes an extremely large value, causing the error.

Fix by explicitly casting to uint64_t before multiplication:

final_graph.resize(uint64_t(ntotal) * K);

System Information

  • RAM: 900GB (not a memory constraint issue)
  • FAISS Version: 1.10 and master branch
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions