consensus_encoding: Implement additional decoders#5057
Conversation
|
|
||
| // `cast_to_usize_if_valid` asserts length < 4,000,000, so no DoS vector here. | ||
| self.buffer = Vec::with_capacity(length as usize); | ||
| } | ||
|
|
There was a problem hiding this comment.
In Core they enforce incremental memory allocation to prevent an attacker from claiming a large vector size while providing minimal data. It's set to 5Mb through MAX_VECTOR_ALLOCATE, so instead of allocating everything with Vec::with_capacity(length as usize) we can allocate memory in chunks.
Not sure if we need to bother with that here since we already hard cap things at 4Mb with MAX_VEC_SIZE
There was a problem hiding this comment.
Also I noticed we have MAX_ENCODABLE_VALUE=0x02000000 in the code, but it doesn’t seem to be used anywhere. I’m not sure yet why this constant is set to 32 MB, I tried to trace it back in Core and didn’t find a clear reason
There was a problem hiding this comment.
It's just an arbitrary anti-DoS limit in Core.
There was a problem hiding this comment.
Re your first point I don't think we need more memory protection other than the 4,000,000 byte limit.
Re the second point, thanks for noticing. Its discussed in #3264.
There was a problem hiding this comment.
Re your first point I don't think we need more memory protection other than the 4,000,000 byte limit.
you're right
| check_decode_one_byte_at_a_time! { | ||
| ByteVecDecoder | ||
| decode_byte_vec, alloc::vec![0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef], | ||
| [0x08, 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef]; |
There was a problem hiding this comment.
In 7e5ccd2:
Can you add a test that uses a multi-byte length prefix (annoyingly it'll need to have like 260 bytes)? The logic around assigning self.prefix_decoder is a little subtle and currently we don't test the Ok(true) path.
There was a problem hiding this comment.
Yeah I had that thought myself yesterday ... can do.
|
|
||
| // 4,000,000 needs more than 16 bits. | ||
| if mem::size_of::<usize>() <= 2 { | ||
| return Err(LengthPrefixExceedsMaxError { value: n }); |
There was a problem hiding this comment.
In 7e5ccd2:
See my repeated comments in #4998 about this. (This PR is not in draft but that one is; should we move discussion here?)
Just use usize::try_from(n).map_err(|_| LengthPrefixExceedsMaxError { value: n }) and be done with it. There is no need to special-case 16-bit machines. You can add a comment if you want saying that this error path will only be hit on 16-bit machines (or, I suppose, 18-bit machines, which the current code handles incorrectly). But the logic is the same regardless of machine size.
There was a problem hiding this comment.
Ouch, sucks failing in public. Now I've made the change (it belongs in this PR).
There was a problem hiding this comment.
Sorry for my tone BTW -- I'm not actually annoyed, I'm just writing fire-and-forget review comments and moving onto the next Github notification.
There was a problem hiding this comment.
No stress bro, fail is a fail. You don't have to pull punches with me. Write fast.
| /// The encoding is expected to start with the number of encoded bytes (length prefix). | ||
| #[cfg(feature = "alloc")] | ||
| pub struct ByteVecDecoder { | ||
| prefix_decoder: Option<CompactSizeDecoder>, |
There was a problem hiding this comment.
In 7e5ccd2 is there a downside to eagerly initializing prefix_decoder in the ByteVecDecoder::new constructor and using it to represent the state of the ByteVecDecoder? If present, decoding the length prefix, else decoding the bytes. Then we could simplify the internal state and drop the prefix_read flag and perhaps simplify logic in the min_bytes_needed method.
There was a problem hiding this comment.
This may be possible, and other refactors. I'd like to get this merged and iterate on the internals (as well as adding more tests). I'm hoping to only fix grievous issues, it would be really nice to get this in so we can rc.0 by Nashville.
There was a problem hiding this comment.
Ah, since I was doing the PhantomData I did this too. Thanks for pushing me.
There was a problem hiding this comment.
While doing so I forgot to re-set it after taking it out of the option and the tests did not fail (because single byte compact size). Goes with @apoelstra's comment above, we need tests that use multi-byte compact size encodings.
There was a problem hiding this comment.
fwiw I still see the prefix_decoder in ByteVecDecoder, but looks like VecDecoder<T: Decodable> was updated to the new logic.
There was a problem hiding this comment.
Geez, hard to get good help huh. Thanks for being so patient with me.
| buffer: Vec<T>, | ||
|
|
||
| decoder: Option<<T as Decodable>::Decoder>, | ||
| _marker: PhantomData<T>, |
There was a problem hiding this comment.
In 501b350 is the PhantomData required? I am new to the pattern, but the type is being used in the buffer field of VecDecoder<T> so I think the compiler is able to track it without any marker?
There was a problem hiding this comment.
Oh I rekon you are right, I'll have a go at that.
|
Still todo: a bunch more tests, including multi-byte compact size length prefixes. Will come back. |
62391f4 to
d72b8f7
Compare
df8d68c to
5809b2d
Compare
Found with clippy.
This is Kix's work shamelessly stolen from his draft PR rust-bitcoin#2184 Implement a decoder for decoding compact size integer. Will be used to decode the length prefix when implementing vec decoders. Co-authored-by: Martin Habovstiak <[email protected]>
More terse, no loss of clarity. Import `PhantomData` and `mem`.
5809b2d to
e22b6a9
Compare
| cast_to_usize_if_valid(length).map_err(|e| E(Inner::LengthPrefixInvalid(e)))?; | ||
|
|
||
| // `cast_to_usize_if_valid` asserts length < 4,000,000, so no DoS vector here. | ||
| self.buffer = Vec::with_capacity(length as usize); |
There was a problem hiding this comment.
In 4c3e8da:
Could you eliminate this cast, for example, by using self.bytes_expected which you obtained two lines above by calling cast_to_usize_if_valid(length)?
| } else { | ||
| let items_left_to_decode = self.length - self.buffer.len(); | ||
| let decoder = T::decoder(); | ||
| // This could be inaccurate (eg 1 for a `ByteVecDecoder`) but its the best we can do. |
There was a problem hiding this comment.
In e22b6a9:
The best we can do is to return 1. What you are returning here (querying a dummy encoder and then multiplying its result by items_left_to_decode) might wildly overestimate, causing Reads to extract unnecessary data from their underlying source and then losing it.
There was a problem hiding this comment.
How will it overestimate? If its a vector types with fixed size eg Txids this will be accurate. If its a vector of ByteVecDecoders it will be 1 byte for each left to decode, which is not an overread. Do you have a case in mind that would overread?
There was a problem hiding this comment.
I added this test, does it convince you its ok?
#[cfg(feature = "std")]
#[test]
fn decode_vec_from_read_unbuffered_success() {
let encoded = [0x01, 0xEF, 0xBE, 0xAD, 0xDE, 0xff, 0xff, 0xff, 0xff];
let mut cursor = Cursor::new(&encoded);
let got = crate::decode_from_read_unbuffered::<Test, _>(&mut cursor).unwrap();
assert_eq!(cursor.position(), 5);
let want = Test(vec![Inner(0xDEAD_BEEF)]);
assert_eq!(got, want);
}There was a problem hiding this comment.
Oh, that is not much context. You'll need this too
#[cfg(feature = "alloc")]
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct Test(Vec<Inner>);
/// The decoder for the [`Test`] type.
#[cfg(feature = "alloc")]
#[derive(Default)]
pub struct TestDecoder(VecDecoder<Inner>);
```There was a problem hiding this comment.
I raised #5067, can we merge and make sure that is closed before final release? We still have the transaction decoders to go and its way harder to review that this one. It also should prove this PR further. I'd like to avoid too many more 24 turn arounds for chats. Nashville is coming and I'm starting to feel like we can get the first primitives RC release out the door.
There was a problem hiding this comment.
What if any of them return zero? What if whatever T::encoder().min_bytes_needed() returns is lower from what an actual encoder would return?
There was a problem hiding this comment.
I think this is okay, actually, because the initial value of T::decoder().min_bytes_needed() will give an accurate bound on the number of bytes initially consumable by each of the following Ts.
Add a decoder for decoding a byte vector. Include a single unit test that does multiple calls to `push_bytes`.
e22b6a9 to
a2ce3db
Compare
a2ce3db to
736c7c1
Compare
|
cc @nyonson I'll go ahead and merge this once you ack. |
|
ACK 736c7c1 I think we still want to refactor the internal state logic of |
|
My bad, I did intend to do that but just forgot to because that's my MO this week. Will hack it up now and push a followup PR. Thanks again for you patience, I get there in the end ... |
|
Done in #5079 |
|
Bad merge bot, I was hoping for a Saturday morning dopamine hit. |
|
Lol! I think if I'd just merged things in a different order you'd have gotten your hit. |
|
Got it now!!! |
Implement
CompactSizeDecoder,ByteVecDecoder, andVecDecoder<T>.Pulled out of #4998 because I was too optimistic about doing the whole thing at once.