-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
Description
Spinoff from this cool comment, thanks to hashing guru @bruno-roustant:
Instead, we should multiply with the gold constant BitMixer#PHI_C64 (make it public).
This really makes a difference in the evenness of the value distribution. This is one of the secrets of the HPPC hashing. By applying this, we get multiple advantages:
* lookup should be improved (less hash collision)
* we can try to rehash at 3/4 occupancy because the performance should not be impacted until this point.
* in case of hash collision, we can lookup linearly with a pos = pos + 1 instead of quadratic probe (lines 95 and 327); this may avoid some mem cache miss.
*
(same for the other hash method)
This is a simple change, we just need to test on some real FST building cases to confirm good mixing "in practice". The new IndexToFST tool in luceneutil is helpful for this.