Avoid obtaining several identical dimensions between two LSH sub-vectors by choosing orthogonal sub-vectors.#2071
Conversation
|
@vpisarev Vadim, one more Flann contribution. Please review it or redirect to some Flann-responsible person:) |
|
@pemmanuelviel, terribly sorry for delay, it's been a terribly busy time. The patch probably makes sense algorithmically (I'm not familiar with LSH algorithm), but technically using static non-constant variable is very, very bad practice. It makes the function completely non-reenterable, e.g. non-usable from different threads. Please, using normal vector or some different solution |
|
OK, I will check that. |
|
@pemmanuelviel any updates? |
|
@pemmanuelviel Do you still working on it? |
|
I still have it in mind and I hope to switch to this task in two weeks.
|
|
Great to hear it! |
|
Done! |
|
👍 |
LSH algorithm consists in accelerating the vectors retrieval in a database by using several vectors of smaller dimension (e.g. 22 dimensions by default instead of usual 512 ones). These sub-vectors are created by picking randomly dimensions among the 512 ones.
But the more sub-vectors we have and the higher are their dimensions, the more dimensions will be common between two vectors. Thus the gain in quality for retrieving the closest vector in the database will not increase regularly with the number of sub-vectors (e.g. increase the sub-vectors number from 16 to 24 with a multi_probe_level set to 1)
To ensure a better repartition among dimensions, the best is to avoid having two vectors with dimensions in common. Thus we select orthogonal vectors.