Skip to content

Conversation

@arowser
Copy link
Contributor

@arowser arowser commented Nov 23, 2015

Tested on PPC/MIPS :
Before change:

test_bitcoin --run_test=addrman_tests
Running 5 test cases...
test/addrman_tests.cpp(141): error in "addrman_new_collisions": check addrman.size() == 3 failed
test/addrman_tests.cpp(145): error in "addrman_new_collisions": check addrman.size() == 4 failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed
test/addrman_tests.cpp(166): error in "addrman_tried_collisions": check addrman.size() == i failed

*** 5 failures detected in test suite "Bitcoin Test Suite"

After:

test_bitcoin --run_test=addrman_tests
Running 5 test cases...

*** No errors detected

@jgarzik
Copy link
Contributor

jgarzik commented Nov 23, 2015

eh, use WriteLE64()?

@dcousens
Copy link
Contributor

LGTM, utACK

edit: See @laanwj's comment

edit2: Conflicted, utACK on the code

@laanwj
Copy link
Member

laanwj commented Nov 24, 2015

This is on purpose! The "hash" is meant to be as fast as possible by using endian-dependent operations. E.g. within one program run it is stable and that is all that matters - it is not supposed to be used with anything that leaves the program.
This is mentioned, literally, in the comment:
https://github.com/bitcoin/bitcoin/blob/master/src/uint256.h#L118

Edit: as for the test failure, have you looked into why this causes a test failure? In principle a different hash function should not break the tests (or at least the correctness of the code), so this error may lie deeper.

@laanwj laanwj added the Tests label Nov 24, 2015
@arowser
Copy link
Contributor Author

arowser commented Nov 25, 2015

The failed test case is addrman_new_collisions and addrman_tried_collisions: https://github.com/bitcoin/bitcoin/blob/master/src/test/addrman_tests.cpp#L141

The testcase make a hash collision, The value of "hash1 % ADDRMAN_BUCKET_SIZE" https://github.com/bitcoin/bitcoin/blob/master/src/addrman.cpp#L29 is always 28 when addr1 is 250.1.1.1 or addr1 is 250.1.1.4 when little endian, but its different in big endian, its the reason that test failed.

addrman.Add(CAddress(addr1), source)
    {
    ...
    int nUBucketPos = infoOld.GetBucketPosition(nKey, true, nUBucket);
        --> {
            uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();
            return hash1 % ADDRMAN_BUCKET_SIZE; //this is same when little endian when addr1 is 250.1.1.1 or 250.1.1.4
            }
        ...
    ClearNew(int nUBucket, int nUBucketPos)
         -->{
            if (vvNew[nUBucket][nUBucketPos] != -1)
            ...
            vvNew[nUBucket][nUBucketPos] = -1;
            }
    }

The test result in big endian should be consistent with the little endian.

@laanwj
Copy link
Member

laanwj commented Nov 25, 2015

Bleh, so the test relies very strongly on the exact outcome of the hash function (as the algorithm is "randomized" by that), even though both results are correct inthemselves?

@luke-jr
Copy link
Member

luke-jr commented Nov 25, 2015

These addrman "tests" look pretty crappy in general... lots of checking internal implementation details rather than expected external behaviour. :/

@arowser
Copy link
Contributor Author

arowser commented Nov 25, 2015

The addrman_new_collisions first call addrman.MakeDeterministic(), it set addrman.nKey is NULL, then the addrman addr placement in new table will be deterministic, it seems try to simulate hash collisions when addrman.add() .

@laanwj
Copy link
Member

laanwj commented Nov 25, 2015

The primary reason I'm protesting here is because GetCheapHash is used in CSignatureCacheHasher, which is a performance critical path, and also the only use beside addrman.

It is possible that adding a (conditional) byte swap has hardly impact on performance, but that would have to be benchmarked.

src/uint256.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what you want here is ReadLE64, not WriteLE64:

return ReadLE64(data);

@sipa
Copy link
Member

sipa commented Nov 26, 2015

I doubt the performance penalty of ReadLE64 is going to matter really on any system.

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Unsure.

@laanwj
Copy link
Member

laanwj commented Nov 27, 2015

I liked the idea of having an unportable fast function, though, but being able to rely on behaviour does make testing easier too.

Right - in the case of addrman, the overhead is nothing compared to the double-SHA256, in all its uses:

src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetKey()).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << (hash1 % ADDRMAN_TRIED_BUCKETS_PER_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << GetGroup() << vchSourceGroupKey).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash2 = (CHashWriter(SER_GETHASH, 0) << nKey << vchSourceGroupKey << (hash1 % ADDRMAN_NEW_BUCKETS_PER_SOURCE_GROUP)).GetHash().GetCheapHash();
src/addrman.cpp:    uint64_t hash1 = (CHashWriter(SER_GETHASH, 0) << nKey << (fNew ? 'N' : 'K') << nBucket << GetKey()).GetHash().GetCheapHash();

So the two usecases of GetCheapHash are wildly different. In the case of sigcache every cycle counts, in the case of addrman we may just as well be using something else.

E.g. could also remove the function from uint256 class and have tailored implementations at the call sites

@sipa
Copy link
Member

sipa commented Nov 27, 2015

@laanwj Even the signature cache now uses a SHA256 (though that should at some point be replaced with a fast non-cryptographic but highly collision-resistant hash function). A single byteswap (and only on BE platforms...) is not going to be measurable.

@laanwj
Copy link
Member

laanwj commented Nov 27, 2015

OK, just going ahead and merging this then

@laanwj laanwj merged commit c434940 into bitcoin:master Nov 27, 2015
laanwj added a commit that referenced this pull request Nov 27, 2015
c434940 uint256::GetCheapHash bigendian compatibility (daniel)
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants