Switch to the SipHash hash function#24
Switch to the SipHash hash function#24nicoonoclaste wants to merge 7 commits intoocaml:trunkfrom nicoonoclaste:hash
Conversation
There was a problem hiding this comment.
there is a discrepancy between this comment and the uint64 declaration above; wouldn't it be correct to keep them int32 in this function?
There was a problem hiding this comment.
Yes, they could be casted to 64-bits just before shifting.
If this is clearer, I will change it.
|
Some high-level comments:
|
|
I think this patch would really benefit from having a dedicated testsuite. I would expect at least:
|
|
@gashe As mentioned in the caveats, I haven't written/fixed the testsuite yet. @xavierleroy I used Also, regarding the use of |
|
@bobot The trivial collision occur before the integers are even fed to the hash function. Here is some quick-and-dirty code (for 64-bits machines) to generate universal collisions with a small integer: PS: The collisions are universal in the sense that, using the keyed hash function |
Perhaps I'm missing something obvious but when I looked at the patch yesterday, I was thinking to myself: "Why doesn't he just use a |
There was a problem hiding this comment.
I would suggest appending ULL to the integer constants.
Is that really an issue? Are there systems out there that ocaml supports that have neither a uint64_t or unsigned long long type? While purely anecdotal, I don't think I've seen any such system in the last 10 years, if not longer. I would be more worried on the performance impact on 32 bits architectures. 32 bits ARM has instructions that let you do 64 bits arithmetic reasonably efficient but x86 probably has to go about it the long and slow way. Then again, most operations here are very basic. I wouldn't be surprised if gcc at -m32 -O3 manages to crank out something that's close to optimal. |
I don't think so, and I'm willing to stop supporting them by removing the I would be more worried on the performance impact on 32 bits architectures.
|
|
A lot of effort are put into having the best hash functions for the polymorphic hash function of ocaml. Perhaps these hash functions should be accessible from ocaml (using external or better written in ocaml) in order to help the implementation of hash function in ocaml. Often people use combination function that must appear childish for specialist [0][1]: People can use an external library for that, but since the work is done in byterun/, it may be shared in stdlib/. |
|
I second @bobot, it would be very interesting to have efficient, well-behaved built-in hash combinators for hashconsing-lovers (or other people who write their own hash functions). |
|
In the spirit of @bobot's
However, it is probably an orthogonal issue (though the refactoring of the API exposed in |
|
Yes, I should have said that it is orthogonal and shouldn't go in this merge request. But if someone want to write the code to make a new one, I expect it to be well received. |
|
Here are some benchmarks I performed a few weeks ago; I still haven't benchmarked the impact on As shown, the implementation needs some more work. The good news is that it was still fairly naive, and taking advantage of vectorization (either using hand-coded PS: The benchmark was ran on my laptop (Core i5, amd64, Linux) |
This preserves platform-invariance for integers < 2³¹ at a slight cost in performance
Hand-unroll loops (gcc -O misoptimises) and force inlining. Halves the running-time on strings here (amd64 (Core i5), Linux, gcc 4.8.2)
This is in preparation of adding a variant of caml_hash for cases where the initial state of the hash function are known
Also, add some string test vectors to check against the C reference implementation.
|
This PR has been sleeping for a long time, sorry about that. In the meantime, a few things happened:
|
|
Some thoughts:
|
|
Is there any way to have some global in Hashtbl that will set which hash function we want to use? Having hash_secure and hash_fast, one of which will alias to 'hash' based on a global, seems like a decent solution to me. I'd rather not incur a performance penalty on every hashtable, especially when this concerns only a small subset of applications. |
Please don't. |
|
Fine. But should the default then be fast or secure? I could make the argument that if you're writing for domains where security is critical, you should be using Hashtbl.Make for that. And can we still have access to the old, fast hash, since I use Hashtbl.hash in many places that don't have to do with hash tables? I would like to maintain that fast hash as Hashtbl.hash_fast |
|
@funny-falcon: thanks a lot for those pointers. It could very well be that your 32-bit "Saphash" variant is fast enough and strong enough for our purposes. When I find the time I'll redo my measurements using Saphash and Siphash13 instead of Siphash24. @bluddy: OCaml's historic hash function (the one we used before Murmur3) isn't that fast -- Murmur3 is faster on strings, if I remember correctly -- and really bad distribution-wise. I'd rather have a single hash function that is good enough and fast enough for most uses. |
Fixes to eliminate mutable vars
SipHash-1-3 is a seeded hash function that, unlike MurmurHash 3, has no known seed-independent collisions. This PR updates the Hashtbl implementation to use SipHash-1-3 as the hash function for randomized hash tables, making them truly resistant to hash flooding attacks. Non-randomized hash tables still use MurmurHash 3, as it is faster than SipHash-1-3, especially on 32-bit platforms, yet good enough, statistically speaking, for a classic hash table. Closes: ocaml#24
|
The resolution of this PR is near, see #9764 . |
SipHash-1-3 is a seeded hash function that, unlike MurmurHash 3, has no known seed-independent collisions. This PR updates the Hashtbl implementation to use SipHash-1-3 as the hash function for randomized hash tables, making them truly resistant to hash flooding attacks. Non-randomized hash tables still use MurmurHash 3, as it is faster than SipHash-1-3, especially on 32-bit platforms, yet good enough, statistically speaking, for a classic hash table. Closes: ocaml#24
SipHash-1-3 is a seeded hash function that, unlike MurmurHash 3, has no known seed-independent collisions. This PR updates the Hashtbl implementation to use SipHash-1-3 as the hash function for randomized hash tables, making them truly resistant to hash flooding attacks. Non-randomized hash tables still use MurmurHash 3, as it is faster than SipHash-1-3, especially on 32-bit platforms, yet good enough, statistically speaking, for a classic hash table. This PR also adds an API for randomized hashing of custom blocks. A "hash_ext" function is added to the operations that can be attached to custom blocks. Unlike the existing "hash" operation, which takes a 32-bit hash state as argument and returns an updated state as result, "hash_ext" takes a pointer to an opaque "caml_hash_state" struct, which is updated in-place. Closes: ocaml#24
SipHash-1-3 is a seeded hash function that, unlike MurmurHash 3, has no known seed-independent collisions. This PR updates the Hashtbl implementation to use SipHash-1-3 as the hash function for randomized hash tables, making them truly resistant to hash flooding attacks. Non-randomized hash tables still use MurmurHash 3, as it is faster than SipHash-1-3, especially on 32-bit platforms, yet good enough, statistically speaking, for a classic hash table. This PR also adds an API for randomized hashing of custom blocks. A "hash_ext" function is added to the operations that can be attached to custom blocks. Unlike the existing "hash" operation, which takes a 32-bit hash state as argument and returns an updated state as result, "hash_ext" takes a pointer to an opaque "caml_hash_state" struct, which is updated in-place. Closes: ocaml#24
Smg write barrier optim
* Comballoc bugfix for local types. When Comballoc does an actual allocation, the state of the other allocation must must be preserved, as there are two independent allocation combining sequences. * Bugfix for type_function: check return modes of curried lambdas * Print total local allocations under v=0x400 * Rudimentary GC scanning for local types
* Comballoc bugfix for local types. When Comballoc does an actual allocation, the state of the other allocation must must be preserved, as there are two independent allocation combining sequences. * Bugfix for type_function: check return modes of curried lambdas * Print total local allocations under v=0x400 * Rudimentary GC scanning for local types
173842c Merge flambda-backend changes ed7eba2 Remove leading space from LINE. (oxcaml/oxcaml#484) bd61170 Bump magic numbers (ocaml#5) c50c47d Add CI builds with local allocations enabled 1412792 Move local allocations support behind '-extension local' 6d8e42a Better tail call behaviour in caml_applyN c7dac3d Typemod: toplevel bindings escape even if no variables are bound 82d6c3e Several fixes for partial application and currying d05c70c Pprintast support for new local syntax e0e62fc Typecheck x |> f y as (f y x), not ((f y) x) d7e34ce Remove autogeneration of @ocaml.curry b9a0593 Port oxcaml/oxcaml#493 0a872d9 Code review fixes from oxcaml/oxcaml#491 6c168bb Remove local allocation counting 3c6e7f0 Code review fixes from oxcaml/oxcaml#478 bb97207 Rename Lambda.apply_position a7cb650 Quieten Makefile when runtime dep files are not present c656dc9 Merge flambda-backend changes 11b5424 Avoid printing double spaces in function argument lists 7751faa Restore locations to Typedtree.{pat,let}_bound_idents_full e450b6c add build_ocaml_compiler.sexp 0403bb3 Revert PR 9895 to continue installing VERSION b3447db Ensure new local attributes are namespaced properly 7f213fc Allow empty functions again 8f22ad8 Bugfix: ensure local domain state is initialised 80f54dd Bugfix for Selectgen with regions e8133a1 Fix external-external signature inclusion 9840051 Bootstrap d879f23 Merge remote-tracking branch 'jane/local-reviewed' into local-merge 94454f5 Use Local_store for the local allocations ref 54a164c Create fewer regions, according to typechecking (ocaml#59) 1c2479b Merge flambda-backend changes ce34678 Fix printing of modes in return types 91f2281 Hook mode variable solving into Btype.snapshot/backtrack 54e4b09 Move Alloc_mode and Value_mode to Btype ff4611e Merge flambda-backend changes ce62e45 Ensure allocations are initialised, even dead ones 6b6ec5a Fix the alloc.ml test on 32-bit builds 81e9879 Merge flambda-backend changes 40a7f89 Update repo URL for ocaml-jst, and rename script. 0454ee7 Add some new locally-allocating primitives (ocaml#57) 8acdda1 Reset the local stack pointer in exception handlers (ocaml#56) 8dafa98 Improve typing for (||) and (&&) (ocaml#55) 8c64754 Fix make_check_all_arches (ocaml#54) b50cd45 Allow arguments to primitives to be local even in tail position (ocaml#53) cad125d Fix modes from or-patterns (ocaml#50) 4efdb72 Fix tailcalls tests with inlining (ocaml#52) 4a795cb Flambda support (ocaml#49) 74722cb Add [@ocaml.principal] and [@ocaml.noprincipal] attributes, and use in oo.mli 6d7d3b8 Ensure that functions are evaluated after their arguments (flambda-backend ocaml#353) 89bda6b Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412) a39126a Fix tailcalls within regions (ocaml#48) 4ac4cfd Fix stdlib manpages build 3a95f5e Merge flambda-backend changes efe80c9 Add jane/pull-flambda-patches script fca94c4 Register allocations for Omitted parameter closures (ocaml#47) 103b139 Remove various FIXMEs (ocaml#46) 62ba2c1 Bootstrap a0062ad Allow local allocations for various primitives (ocaml#43) 7a2165e Allow primitives to be poly-moded (ocaml#43) 2af3f55 Fix a flaky test by refactoring TypePairs (ocaml#10638) 58dd807 Bootstrap ee3be10 Fix modes in build_apply for partial applications fe73656 Tweak for evaluation order of labelled partial applications (ocaml#10653) 0527570 Fix caml_modify on local allocations (ocaml#40) e657e99 Relax modes for `as` patterns (ocaml#42) f815bf2 Add special mode handling for tuples in matches and let bindings (ocaml#38) 39f1211 Only take the upper bounds of modes associated with allocations (ocaml#37) aec6fde Interpret arrow types in "local positions" differently c4f3319 Bootstrap ff6fdad Add some missing regions 40d586d Bootstrap 66d8110 Switch to a system with 3 modes for values f2c5a85 Bugfix for Comballoc with local allocations. (ocaml#41) 83bcd09 Fix bug with root scanning during compaction (ocaml#39) 1b5ec83 Track modes in Lambda.lfunction and onwards (ocaml#33) f1e2e97 Port ocaml#10728 56703cd Port ocaml#10081 eb66785 Support local allocations in i386 and fix amd64 bug (ocaml#31) c936b19 Disallow local recursive non-functions (ocaml#30) c7a193a GC support for local allocations (ocaml#29) 8dd7270 Nonlocal fields (ocaml#28) e19a2f0 Bootstrap 694b9ac Add syntax to the parser for local allocations (ocaml#26) f183008 Lower initial stack size 918226f Allow local closure allocations (ocaml#27) 2552e7d Introduce mode variables (ocaml#25) bc41c99 Minor fixes for local allocations (ocaml#24) a2a4e60 Runtime and compiler support for more local allocations (ocaml#23) d030554 Typechecking for local allocations (ocaml#21) 9ee2332 Bugfix missing from ocaml#20 02c4cef Retain block-structured local regions until Mach. 86dbe1c amd64: Move stack realloc calls out-of-line 324d218 More typing modes and locking of environments a4080b8 Initial version of local allocation (unsafe) git-subtree-dir: ocaml git-subtree-split: 173842c
0b0aefb Turn some partial application warnings into hints (ocaml#11338) (ocaml#30) 2caa9ee Add [@tail] and [@nontail] annotations on applications to control tailcalls (ocaml#31) 9fb218a Update `promote` target to use the `one` machinery (ocaml#28) b5ea912 Make empty types immediate bc08236 Add failing test of an empty type being immediate f2d439f Propagate escaping_context to Env locks to hint about errors (ocaml#25) 35569e1 Allow warning 68 to be controlled by attributes (ocaml#16) 28a6243 Allow type_argument to weaken return modes of expected function types (ocaml#24) cdc728f Fix 'make alldepend' in otherlibs/dynlink 7807d18 make alldepend 2d6af2f Merge flambda-backend changes git-subtree-dir: ocaml git-subtree-split: 0b0aefb
|
Seems that this can be subsumed by #9764? |
The patchset changes the C API exposed in
hash.hto make it digest-agnostic(so as to make future changes simpler) and switches the hash function to
SipHash-2-4.
SipHash is a familly of cryptographically-strong hash functions which are
competitive (performance-wise) with MurmurHash, the hash function currently used.
SipHash-2-4 is the recommended version of SipHash.
Preliminary benchmarks look good. However, they were done using the functorized
version of Hashtbl, and this specific proposal hasn't been benchmarked yet.
Caveats:
Commits expected to change the behaviour of the hash function are
https://github.com/nbraud/ocaml/commit/28bde97b87021e6bab9f74f49d24905611f2d62b and
https://github.com/nbraud/ocaml/commit/7dde207fe838558abbbef4fe0607229111780191
This weakens the hash function, compared to SipHash-2-4 as specified by its
authors, but isn't fixable without breaking the API.