Skip to content

consensus_encoding: Implement additional decoders#5057

Merged
apoelstra merged 6 commits into
rust-bitcoin:masterfrom
tcharding:push-rpwqsvvtwvlu
Oct 3, 2025
Merged

consensus_encoding: Implement additional decoders#5057
apoelstra merged 6 commits into
rust-bitcoin:masterfrom
tcharding:push-rpwqsvvtwvlu

Conversation

@tcharding

Copy link
Copy Markdown
Member

Implement CompactSizeDecoder, ByteVecDecoder, and VecDecoder<T>.

Pulled out of #4998 because I was too optimistic about doing the whole thing at once.

Comment on lines +74 to +78

// `cast_to_usize_if_valid` asserts length < 4,000,000, so no DoS vector here.
self.buffer = Vec::with_capacity(length as usize);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Core they enforce incremental memory allocation to prevent an attacker from claiming a large vector size while providing minimal data. It's set to 5Mb through MAX_VECTOR_ALLOCATE, so instead of allocating everything with Vec::with_capacity(length as usize) we can allocate memory in chunks.

Not sure if we need to bother with that here since we already hard cap things at 4Mb with MAX_VEC_SIZE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I noticed we have MAX_ENCODABLE_VALUE=0x02000000 in the code, but it doesn’t seem to be used anywhere. I’m not sure yet why this constant is set to 32 MB, I tried to trace it back in Core and didn’t find a clear reason

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an arbitrary anti-DoS limit in Core.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re your first point I don't think we need more memory protection other than the 4,000,000 byte limit.

Re the second point, thanks for noticing. Its discussed in #3264.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re your first point I don't think we need more memory protection other than the 4,000,000 byte limit.

you're right

check_decode_one_byte_at_a_time! {
ByteVecDecoder
decode_byte_vec, alloc::vec![0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef],
[0x08, 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7e5ccd2:

Can you add a test that uses a multi-byte length prefix (annoyingly it'll need to have like 260 bytes)? The logic around assigning self.prefix_decoder is a little subtle and currently we don't test the Ok(true) path.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had that thought myself yesterday ... can do.


// 4,000,000 needs more than 16 bits.
if mem::size_of::<usize>() <= 2 {
return Err(LengthPrefixExceedsMaxError { value: n });

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7e5ccd2:

See my repeated comments in #4998 about this. (This PR is not in draft but that one is; should we move discussion here?)

Just use usize::try_from(n).map_err(|_| LengthPrefixExceedsMaxError { value: n }) and be done with it. There is no need to special-case 16-bit machines. You can add a comment if you want saying that this error path will only be hit on 16-bit machines (or, I suppose, 18-bit machines, which the current code handles incorrectly). But the logic is the same regardless of machine size.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, sucks failing in public. Now I've made the change (it belongs in this PR).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my tone BTW -- I'm not actually annoyed, I'm just writing fire-and-forget review comments and moving onto the next Github notification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No stress bro, fail is a fail. You don't have to pull punches with me. Write fast.

/// The encoding is expected to start with the number of encoded bytes (length prefix).
#[cfg(feature = "alloc")]
pub struct ByteVecDecoder {
prefix_decoder: Option<CompactSizeDecoder>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7e5ccd2 is there a downside to eagerly initializing prefix_decoder in the ByteVecDecoder::new constructor and using it to represent the state of the ByteVecDecoder? If present, decoding the length prefix, else decoding the bytes. Then we could simplify the internal state and drop the prefix_read flag and perhaps simplify logic in the min_bytes_needed method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be possible, and other refactors. I'd like to get this merged and iterate on the internals (as well as adding more tests). I'm hoping to only fix grievous issues, it would be really nice to get this in so we can rc.0 by Nashville.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, since I was doing the PhantomData I did this too. Thanks for pushing me.

@tcharding tcharding Sep 30, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While doing so I forgot to re-set it after taking it out of the option and the tests did not fail (because single byte compact size). Goes with @apoelstra's comment above, we need tests that use multi-byte compact size encodings.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I still see the prefix_decoder in ByteVecDecoder, but looks like VecDecoder<T: Decodable> was updated to the new logic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geez, hard to get good help huh. Thanks for being so patient with me.

buffer: Vec<T>,

decoder: Option<<T as Decodable>::Decoder>,
_marker: PhantomData<T>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 501b350 is the PhantomData required? I am new to the pattern, but the type is being used in the buffer field of VecDecoder<T> so I think the compiler is able to track it without any marker?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I rekon you are right, I'll have a go at that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, works. Thanks.

@tcharding

Copy link
Copy Markdown
Member Author

Still todo: a bunch more tests, including multi-byte compact size length prefixes. Will come back.

@tcharding tcharding force-pushed the push-rpwqsvvtwvlu branch 2 times, most recently from 62391f4 to d72b8f7 Compare October 1, 2025 03:18
@github-actions github-actions Bot added the C-internals PRs modifying the internals crate label Oct 1, 2025
@tcharding tcharding force-pushed the push-rpwqsvvtwvlu branch 3 times, most recently from df8d68c to 5809b2d Compare October 1, 2025 04:59
tcharding and others added 3 commits October 1, 2025 15:11
This is Kix's work shamelessly stolen from his draft PR rust-bitcoin#2184

Implement a decoder for decoding compact size integer. Will be
used to decode the length prefix when implementing vec decoders.

Co-authored-by: Martin Habovstiak <[email protected]>
More terse, no loss of clarity. Import `PhantomData` and `mem`.
cast_to_usize_if_valid(length).map_err(|e| E(Inner::LengthPrefixInvalid(e)))?;

// `cast_to_usize_if_valid` asserts length < 4,000,000, so no DoS vector here.
self.buffer = Vec::with_capacity(length as usize);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4c3e8da:

Could you eliminate this cast, for example, by using self.bytes_expected which you obtained two lines above by calling cast_to_usize_if_valid(length)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

} else {
let items_left_to_decode = self.length - self.buffer.len();
let decoder = T::decoder();
// This could be inaccurate (eg 1 for a `ByteVecDecoder`) but its the best we can do.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e22b6a9:

The best we can do is to return 1. What you are returning here (querying a dummy encoder and then multiplying its result by items_left_to_decode) might wildly overestimate, causing Reads to extract unnecessary data from their underlying source and then losing it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will it overestimate? If its a vector types with fixed size eg Txids this will be accurate. If its a vector of ByteVecDecoders it will be 1 byte for each left to decode, which is not an overread. Do you have a case in mind that would overread?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test, does it convince you its ok?

    #[cfg(feature = "std")]
    #[test]
    fn decode_vec_from_read_unbuffered_success() {
        let encoded = [0x01, 0xEF, 0xBE, 0xAD, 0xDE, 0xff, 0xff, 0xff, 0xff];
        let mut cursor = Cursor::new(&encoded);

        let got = crate::decode_from_read_unbuffered::<Test, _>(&mut cursor).unwrap();
        assert_eq!(cursor.position(), 5);

        let want = Test(vec![Inner(0xDEAD_BEEF)]);
        assert_eq!(got, want);
    }

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that is not much context. You'll need this too

    #[cfg(feature = "alloc")]
    #[derive(Clone, Debug, PartialEq, Eq)]
    pub struct Test(Vec<Inner>);

    /// The decoder for the [`Test`] type.
    #[cfg(feature = "alloc")]
    #[derive(Default)]
    pub struct TestDecoder(VecDecoder<Inner>);
```

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised #5067, can we merge and make sure that is closed before final release? We still have the transaction decoders to go and its way harder to review that this one. It also should prove this PR further. I'd like to avoid too many more 24 turn arounds for chats. Nashville is coming and I'm starting to feel like we can get the first primitives RC release out the door.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if any of them return zero? What if whatever T::encoder().min_bytes_needed() returns is lower from what an actual encoder would return?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay, actually, because the initial value of T::decoder().min_bytes_needed() will give an accurate bound on the number of bytes initially consumable by each of the following Ts.

Add a decoder for decoding a byte vector. Include a single unit test
that does multiple calls to `push_bytes`.

@apoelstra apoelstra left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 736c7c1; successfully ran local tests

@apoelstra

Copy link
Copy Markdown
Member

cc @nyonson I'll go ahead and merge this once you ack.

@nyonson

nyonson commented Oct 2, 2025

Copy link
Copy Markdown
Contributor

ACK 736c7c1

I think we still want to refactor the internal state logic of ByteVecDecoder, but can be done in a follow up.

@tcharding

Copy link
Copy Markdown
Member Author

My bad, I did intend to do that but just forgot to because that's my MO this week. Will hack it up now and push a followup PR. Thanks again for you patience, I get there in the end ...

@tcharding

Copy link
Copy Markdown
Member Author

Done in #5079

@tcharding

Copy link
Copy Markdown
Member Author

Bad merge bot, I was hoping for a Saturday morning dopamine hit.

@apoelstra apoelstra merged commit f686521 into rust-bitcoin:master Oct 3, 2025
25 checks passed
@apoelstra

Copy link
Copy Markdown
Member

Lol! I think if I'd just merged things in a different order you'd have gotten your hit.

@tcharding

Copy link
Copy Markdown
Member Author

Got it now!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-internals PRs modifying the internals crate test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants