Skip to content

Improve hash mixing in FST's double-barrel LRU hash #12704

@mikemccand

Description

@mikemccand

Description

Spinoff from this cool comment, thanks to hashing guru @bruno-roustant:

Instead, we should multiply with the gold constant BitMixer#PHI_C64 (make it public).
This really makes a difference in the evenness of the value distribution. This is one of the secrets of the HPPC hashing. By applying this, we get multiple advantages:

  * lookup should be improved (less hash collision)
  * we can try to rehash at 3/4 occupancy because the performance should not be impacted until this point.
  * in case of hash collision, we can lookup linearly with a pos = pos + 1 instead of quadratic probe (lines 95 and 327); this may avoid some mem cache miss.
  * 
(same for the other hash method)

This is a simple change, we just need to test on some real FST building cases to confirm good mixing "in practice". The new IndexToFST tool in luceneutil is helpful for this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions