Skip to content

hnsw_stats.ndis seems wrong, it contains hnsw visited, which actual not call dis() #3819

@ssk01

Description

@ssk01

Summary

hnsw_stats.ndis seems wrong, it contains hnsw visited, which actual not call dis()

Reproduction instructions

  1. in old version of faiss

before (HNSW speedup + Distance 4 points ([#2841])
ndis didn't contain visited neighbors
image

  1. current main branch, it contains visited neighbors
image
  1. I use bench_hnsw.py as a demo. current hnsw_stats.ndis "ndis + visited"(94231860) is 42% larger than actual"ndis"(65960703).I guess that this ratio varies across vectors generated by different models.
// main branch
(base) 1984MacBook-Air benchs % python3 bench_hnsw.py 1 hnsw_sq
load data
10000 128
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 25994.155 ms
search
efSearch 16        0.012 ms per query, R@1 0.7797, missing rate 0.0000, ndis 4782698
efSearch 32        0.014 ms per query, R@1 0.8731, missing rate 0.0000, ndis 12252905
efSearch 64        0.023 ms per query, R@1 0.9285, missing rate 0.0000, ndis 25107377
efSearch 128       0.043 ms per query, R@1 0.9583, missing rate 0.0000, ndis 48825286
efSearch 256       0.082 ms per query, R@1 0.9714, missing rate 0.0000, ndis 94231860
// I try fix it.
(base) 1984MacBook-Air benchs % python3 bench_hnsw.py 1 hnsw_sq
load data
10000 128
Testing HNSW with a scalar quantizer
training
add
hnsw_add_vertices: adding 1000000 elements on top of 0 (preset_levels=0)
  max_level = 5
Adding 1 elements at level 5
Adding 15 elements at level 4
Adding 194 elements at level 3
Adding 3693 elements at level 2
Adding 58500 elements at level 1
Adding 937597 elements at level 0
Done in 21349.235 ms
search
efSearch 16        0.012 ms per query, R@1 0.7775, missing rate 0.0000, ndis 4190685
efSearch 32        0.014 ms per query, R@1 0.8706, missing rate 0.0000, ndis 10376491
efSearch 64        0.025 ms per query, R@1 0.9284, missing rate 0.0000, ndis 20265123
efSearch 128       0.045 ms per query, R@1 0.9590, missing rate 0.0000, ndis 36973128
efSearch 256       0.084 ms per query, R@1 0.9707, missing rate 0.0000, ndis 65960703

@alexanderguzhva, should I make a pull request to fix it ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions