-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi Lei,
I just came across your RIBLT paper and for the most part I thought it was compelling and clear, and the scheme described has tremendous potential. However, I believe it contains a misunderstanding about the adversarial threat model:
This setting may create an "adversarial workload," where the hash of the symbol representing the user's input does not distribute uniformly. If the user injects into Bob's set a source symbol that hashes to the same value as another source symbol that Alice has, then Bob will never be able to reconcile its set with Alice. This is because Bob will XOR the malicious symbol into the coded symbol stream it receives from Alice, but it will only cancel out the hash of Alice's colliding symbol from the checksum field, and will corrupt the sum field.
If this was the threat in question then it would be trivially solvable by using a cryptographically secure hash function, or at least detectable via the corrupted sum field. Instead, the primary threat we are worried about is a malicious user selecting a set of non-colliding values chosen such that combined they cancel out one or more victim elements, silently censoring them by preventing their reconciliation. These malicious values can be selected at negligible cost using gaussian elimination.
In case it is useful to you, I made a test-case to demonstrate this. In example_test.go, replace the sets to be reconciled with the following:
alice := []item{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32}
bob := []item{3,7,9,13,15,19,21,25,27,29,33,35,43,51,57,59,63,67,69,83,91,95,97,99,101,105,109,111,117,123,127,131}
None of these values result in a SipHash collision, and the sum field is never corrupted. Your algorithm incorrectly signals that these sets are equivalent and that no items need to be transferred. Note that this same attack would be possible even if you replaced SipHash with a secure 256-bit hash function.
Your suggestion to use a keyed hash function and for Alice and Bob to coordinate a session-specific secret-key is valid and does prevent this attack. However, this comes at the cost of breaking universality. As you mention several times, the universal applicability of coded symbols is highly desirable because for large data-sets, re-keying every session would have unacceptable CPU, memory, and I/O costs.
In my opinion, your paper is not clear enough about this trade-off and a casual reader is likely to come away with the impression that RIBLT currently provides both universality and censorship-resistance, when it's actually one or the other.
Last thing: Is the code for your Ethereum synchronisation evaluation available anywhere? I didn't notice a link in the paper and couldn't find it on your github.
Thank you!
Doug