Skip to content

Conversation

@sipa
Copy link
Member

@sipa sipa commented May 6, 2016

Use SipHash-2-4 for:

  • CCoinsViewCache hashmap (instead of a custom Lookup3/XOR scheme)
  • CTxMempool::mapTx txid index (converting from an ordered map)
  • Address relay peer selection (instead of SHA256)

Computing a hash for a txid using this takes around 52 CPU cycles in benchmarks (the Lookup3/XOR based mechanism used for CCoinsViewCache took 31 cycles). I think that's negligible still, while using a much more standard construct designed for such purposes.

@gmaxwell
Copy link
Contributor

gmaxwell commented May 6, 2016

Concept ACK.

@pstratem
Copy link
Contributor

pstratem commented May 6, 2016

Concept ACK
ad00e3af31df0655c1cdc95b8d29dcc1ec413e5b

src/main.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if pnode is 32-bit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I've replaced it by using pnode->id.

@laanwj
Copy link
Member

laanwj commented May 7, 2016

Concept ACK
Python took the same step for 3.4 to convert their hash algorithms to SipHash, and they have similar motivations of mitigating collision attacks, so we're in good company: https://www.python.org/dev/peps/pep-0456/

@sipa
Copy link
Member Author

sipa commented May 8, 2016

Here is a comment about using SipHash-1-3 instead of SipHash-2-4: rust-lang/rust#29754 (comment)

It would be approximately twice as fast for hashing txids, and the distinguisher mentioned in that comment is not relevant for data that is a multiple of 8 bytes (which is the case for us), as 8 bytes of padding are added in that case. Opinions?

@laanwj
Copy link
Member

laanwj commented May 9, 2016

Siphash 2-4 seems to be the defacto 'safe' choice. Siphash 1-3 happens to be secure for our use case right now, but that sounds a tad brittle: as if a small change in our use case, or a small advancement in cryptoanalysis of the function, could mean that the known weakness does suddenly affect us?

Not sure how much hashing the txids is a bottleneck in the whole scheme of things around CCoinsCache, it's only 32 bytes after all. Conservatively I'd say only consider 'downgrading' if performance is seriously affected by going from 29 cycles to 49 cycles.

On the other hand it's pretty nice if using a hash with better security properties could be faster than the Lookup3/XOR scheme, so I see the attraction in that.

@sipa
Copy link
Member Author

sipa commented May 9, 2016

Did some better benchmarks:

  • Current Lookup3-based approach: 31 cycles
  • SipHash-2-4: 52 cycles
  • SipHash-1-3: 32 cycles

@dcousens
Copy link
Contributor

dcousens commented May 11, 2016

Ha, I thought SipHash was something home grown by @sipa... TIL.
LGTM, for the record, do we have a comparison with cycle counts of the other hash functions? (SHA1 say?)

@sipa
Copy link
Member Author

sipa commented May 11, 2016

@dcousens Based on numbers I get from #8039:

  • SHA1: 577 cycles (1 block, 64 bytes)
  • SHA256: 1504 cycles (1 block, 64 bytes)
  • SHA512: 1988 cycles (1 block, 128 bytes)
  • RIPEMD160: 751 cycles (1 block, 64 bytes)

@dcousens
Copy link
Contributor

Thanks @sipa 👍

src/hash.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to not break aliasing rules here? Shouldnt the compiler optimize out a manual-shift-or?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@sipa
Copy link
Member Author

sipa commented May 16, 2016

Going to stick to SipHash-2-4 for now. We can always switch to something else later.

Ready for merge, I think.

@pstratem
Copy link
Contributor

utACK 658c6481616f12a3466d1ddeb23eb91788ab2e66

@gmaxwell
Copy link
Contributor

ACK

src/uint256.h Outdated
Copy link
Contributor

@TheBlueMatt TheBlueMatt May 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this more readable?

Something like

                                 (uint64_t(ptr[0]) <<  0) |
                                 (uint64_t(ptr[1]) <<  8) |
                                 (uint64_t(ptr[2]) << 16) |
                                 (uint64_t(ptr[3]) << 24) |
                                 (uint64_t(ptr[4]) << 32) |
                                 (uint64_t(ptr[5]) << 40) |
                                 (uint64_t(ptr[6]) << 48) |
                                 (uint64_t(ptr[7]) << 56))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sipa added 4 commits May 17, 2016 20:04
This is ~1.7x slower than the Lookup3-of-Xor-with-salt construct we were
using before, but it is a primitive designed for exactly this.
@TheBlueMatt
Copy link
Contributor

ACK a68ec21

@TheBlueMatt TheBlueMatt mentioned this pull request May 18, 2016
@laanwj laanwj merged commit a68ec21 into bitcoin:master May 18, 2016
laanwj added a commit that referenced this pull request May 18, 2016
a68ec21 Use SipHash-2-4 for address relay selection (Pieter Wuille)
8cc9cfe Switch CTxMempool::mapTx to use a hash index for txids (Pieter Wuille)
382c871 Use SipHash-2-4 for CCoinsCache index (Pieter Wuille)
0b1295b Add SipHash-2-4 primitives to hash (Pieter Wuille)
@rebroad
Copy link
Contributor

rebroad commented Aug 9, 2016

I'm trying to work out what problem these siphashes fix. What is the risk that this solves please?

@laanwj
Copy link
Member

laanwj commented Aug 13, 2016

It's fast and cryptographically solid - which reduces DoS risk by attackers remotely predicting the behavior of the hash function.

As I quoted above already, pretty much same rationale as for Python: https://www.python.org/dev/peps/pep-0456/

codablock pushed a commit to codablock/dash that referenced this pull request Dec 21, 2017
a68ec21 Use SipHash-2-4 for address relay selection (Pieter Wuille)
8cc9cfe Switch CTxMempool::mapTx to use a hash index for txids (Pieter Wuille)
382c871 Use SipHash-2-4 for CCoinsCache index (Pieter Wuille)
0b1295b Add SipHash-2-4 primitives to hash (Pieter Wuille)
@elichai
Copy link
Contributor

elichai commented Jan 30, 2020

Here is a comment about using SipHash-1-3 instead of SipHash-2-4: rust-lang/rust#29754 (comment)

It would be approximately twice as fast for hashing txids, and the distinguisher mentioned in that comment is not relevant for data that is a multiple of 8 bytes (which is the case for us), as 8 bytes of padding are added in that case. Opinions?

For future reference.
Because of all the interesting results @jamesob got from the hashmap modifications, I decided to try benchmarking siphash2-4 vs 1-3 by replacing SaltedOutpointHasher and SaltedTxidHasher with SipHash1-3 and this is the result:

Command: /home/profiler/bitcoin/src/bitcoind -datadir="/home/profiler/.bitcoin" -dbcache=12288 -debug=0 -nodebuglogfile -printtoconsole=0 -reindex-chainstate -stopatheight=600000

Before:

real    119m16.133s
user    137m38.092s
sys     2m30.990s

After:

real    115m6.045s
user    133m53.661s
sys     2m30.177s

Spec: i7-8700 CPU @ 3.20GHz, 32GB RAM, NVMe SSD.

furszy added a commit to PIVX-Project/PIVX that referenced this pull request Aug 3, 2020
cbdd0e1 Use SipHash-2-4 for address relay selection (Pieter Wuille)
cf27341 Switch CTxMempool::mapTx to use a hash index for txids (Pieter Wuille)
7e636b2 Use SipHash-2-4 for CCoinsCache index (random-zebra)
15ee581 [Core] implement GetUint64 on uint256 (random-zebra)
b2e5c4a Add SipHash-2-4 primitives to hash (Pieter Wuille)

Pull request description:

  Backports bitcoin#8020

  > Use SipHash-2-4 for:
  >
  > - CCoinsViewCache hashmap (instead of a custom Lookup3/XOR scheme)
  > - CTxMempool::mapTx txid index (converting from an ordered map)
  > - Address relay peer selection (instead of SHA256)
  >
  > Computing a hash for a txid using this takes around 52 CPU cycles in benchmarks (the Lookup3/XOR based mechanism used for CCoinsViewCache took 31 cycles). I think that's negligible still, while using a much more standard construct designed for such purposes.

ACKs for top commit:
  Fuzzbawls:
    ACK cbdd0e1
  furszy:
    ACK cbdd0e1

Tree-SHA512: 223f2f119c94991f1e3594afbacc2dd2476a689b9127860c7e8e244052e5e2cddc486ee44f5345370185f1cd337dfd67c2ec9a44e9dcc8f6cc391a011364ce8f
zkbot added a commit to zcash/zcash that referenced this pull request Feb 15, 2021
Backport of bitcoin/bitcoin#8020

Cherry-picked from bitcoin/bitcoin#8020

- cherry-pick bitcoin/bitcoin@a68ec21f7e
  - conflict with bitcoin/bitcoin@957e5d2
- cherry-pick bitcoin/bitcoin@8cc9cfe160
- cherry-pick bitcoin/bitcoin@382c871d28
  - replace CCoinsKeyHasher with SaltedTxidHasher
- cherry-pick bitcoin/bitcoin@0b1295b066
  - self-conflict with previously-applied bitcoin/bitcoin@b8a657936 (#3180)
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants