consensus_encoding: Implement additional decoders by tcharding · Pull Request #5057 · rust-bitcoin/rust-bitcoin

tcharding · 2025-09-30T08:18:16Z

Implement CompactSizeDecoder, ByteVecDecoder, and VecDecoder<T>.

Pulled out of #4998 because I was too optimistic about doing the whole thing at once.

jrakibi · 2025-09-30T14:32:22Z

+
+            // `cast_to_usize_if_valid` asserts length < 4,000,000, so no DoS vector here.
+            self.buffer = Vec::with_capacity(length as usize);
+        }
+


In Core they enforce incremental memory allocation to prevent an attacker from claiming a large vector size while providing minimal data. It's set to 5Mb through MAX_VECTOR_ALLOCATE, so instead of allocating everything with Vec::with_capacity(length as usize) we can allocate memory in chunks.

Not sure if we need to bother with that here since we already hard cap things at 4Mb with MAX_VEC_SIZE

Also I noticed we have MAX_ENCODABLE_VALUE=0x02000000 in the code, but it doesn’t seem to be used anywhere. I’m not sure yet why this constant is set to 32 MB, I tried to trace it back in Core and didn’t find a clear reason

It's just an arbitrary anti-DoS limit in Core.

Re your first point I don't think we need more memory protection other than the 4,000,000 byte limit.

Re the second point, thanks for noticing. Its discussed in #3264.

Re your first point I don't think we need more memory protection other than the 4,000,000 byte limit.

you're right

apoelstra · 2025-09-30T15:12:47Z

+    check_decode_one_byte_at_a_time! {
+        ByteVecDecoder
+            decode_byte_vec, alloc::vec![0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef],
+        [0x08, 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef];


In 7e5ccd2:

Can you add a test that uses a multi-byte length prefix (annoyingly it'll need to have like 260 bytes)? The logic around assigning self.prefix_decoder is a little subtle and currently we don't test the Ok(true) path.

Yeah I had that thought myself yesterday ... can do.

apoelstra · 2025-09-30T15:16:12Z

+
+    // 4,000,000 needs more than 16 bits.
+    if mem::size_of::<usize>() <= 2 {
+        return Err(LengthPrefixExceedsMaxError { value: n });


In 7e5ccd2:

See my repeated comments in #4998 about this. (This PR is not in draft but that one is; should we move discussion here?)

Just use usize::try_from(n).map_err(|_| LengthPrefixExceedsMaxError { value: n }) and be done with it. There is no need to special-case 16-bit machines. You can add a comment if you want saying that this error path will only be hit on 16-bit machines (or, I suppose, 18-bit machines, which the current code handles incorrectly). But the logic is the same regardless of machine size.

Ouch, sucks failing in public. Now I've made the change (it belongs in this PR).

Sorry for my tone BTW -- I'm not actually annoyed, I'm just writing fire-and-forget review comments and moving onto the next Github notification.

No stress bro, fail is a fail. You don't have to pull punches with me. Write fast.

nyonson · 2025-09-30T16:20:25Z

+/// The encoding is expected to start with the number of encoded bytes (length prefix).
+#[cfg(feature = "alloc")]
+pub struct ByteVecDecoder {
+    prefix_decoder: Option<CompactSizeDecoder>,


In 7e5ccd2 is there a downside to eagerly initializing prefix_decoder in the ByteVecDecoder::new constructor and using it to represent the state of the ByteVecDecoder? If present, decoding the length prefix, else decoding the bytes. Then we could simplify the internal state and drop the prefix_read flag and perhaps simplify logic in the min_bytes_needed method.

This may be possible, and other refactors. I'd like to get this merged and iterate on the internals (as well as adding more tests). I'm hoping to only fix grievous issues, it would be really nice to get this in so we can rc.0 by Nashville.

Ah, since I was doing the PhantomData I did this too. Thanks for pushing me.

While doing so I forgot to re-set it after taking it out of the option and the tests did not fail (because single byte compact size). Goes with @apoelstra's comment above, we need tests that use multi-byte compact size encodings.

fwiw I still see the prefix_decoder in ByteVecDecoder, but looks like VecDecoder<T: Decodable> was updated to the new logic.

Geez, hard to get good help huh. Thanks for being so patient with me.

nyonson · 2025-09-30T16:25:07Z

+    buffer: Vec<T>,
+
+    decoder: Option<<T as Decodable>::Decoder>,
+    _marker: PhantomData<T>,


In 501b350 is the PhantomData required? I am new to the pattern, but the type is being used in the buffer field of VecDecoder<T> so I think the compiler is able to track it without any marker?

Oh I rekon you are right, I'll have a go at that.

Yep, works. Thanks.

tcharding · 2025-09-30T21:41:33Z

Still todo: a bunch more tests, including multi-byte compact size length prefixes. Will come back.

Found with clippy.

This is Kix's work shamelessly stolen from his draft PR rust-bitcoin#2184 Implement a decoder for decoding compact size integer. Will be used to decode the length prefix when implementing vec decoders. Co-authored-by: Martin Habovstiak <[email protected]>

More terse, no loss of clarity. Import `PhantomData` and `mem`.

apoelstra · 2025-10-01T19:24:14Z

+                cast_to_usize_if_valid(length).map_err(|e| E(Inner::LengthPrefixInvalid(e)))?;
+
+            // `cast_to_usize_if_valid` asserts length < 4,000,000, so no DoS vector here.
+            self.buffer = Vec::with_capacity(length as usize);


In 4c3e8da:

Could you eliminate this cast, for example, by using self.bytes_expected which you obtained two lines above by calling cast_to_usize_if_valid(length)?

apoelstra · 2025-10-01T19:32:16Z

+        } else {
+            let items_left_to_decode = self.length - self.buffer.len();
+            let decoder = T::decoder();
+            // This could be inaccurate (eg 1 for a `ByteVecDecoder`) but its the best we can do.


In e22b6a9:

The best we can do is to return 1. What you are returning here (querying a dummy encoder and then multiplying its result by items_left_to_decode) might wildly overestimate, causing Reads to extract unnecessary data from their underlying source and then losing it.

How will it overestimate? If its a vector types with fixed size eg Txids this will be accurate. If its a vector of ByteVecDecoders it will be 1 byte for each left to decode, which is not an overread. Do you have a case in mind that would overread?

I added this test, does it convince you its ok?

#[cfg(feature = "std")] #[test] fn decode_vec_from_read_unbuffered_success() { let encoded = [0x01, 0xEF, 0xBE, 0xAD, 0xDE, 0xff, 0xff, 0xff, 0xff]; let mut cursor = Cursor::new(&encoded); let got = crate::decode_from_read_unbuffered::<Test, _>(&mut cursor).unwrap(); assert_eq!(cursor.position(), 5); let want = Test(vec![Inner(0xDEAD_BEEF)]); assert_eq!(got, want); }

Oh, that is not much context. You'll need this too

#[cfg(feature = "alloc")] #[derive(Clone, Debug, PartialEq, Eq)] pub struct Test(Vec<Inner>); /// The decoder for the [`Test`] type. #[cfg(feature = "alloc")] #[derive(Default)] pub struct TestDecoder(VecDecoder<Inner>); ```

I raised #5067, can we merge and make sure that is closed before final release? We still have the transaction decoders to go and its way harder to review that this one. It also should prove this PR further. I'd like to avoid too many more 24 turn arounds for chats. Nashville is coming and I'm starting to feel like we can get the first primitives RC release out the door.

What if any of them return zero? What if whatever T::encoder().min_bytes_needed() returns is lower from what an actual encoder would return?

I think this is okay, actually, because the initial value of T::decoder().min_bytes_needed() will give an accurate bound on the number of bytes initially consumable by each of the following Ts.

Add a decoder for decoding a byte vector. Include a single unit test that does multiple calls to `push_bytes`.

apoelstra

ACK 736c7c1; successfully ran local tests

apoelstra · 2025-10-02T17:14:15Z

cc @nyonson I'll go ahead and merge this once you ack.

nyonson · 2025-10-02T17:19:30Z

ACK 736c7c1

I think we still want to refactor the internal state logic of ByteVecDecoder, but can be done in a follow up.

tcharding · 2025-10-03T02:53:21Z

My bad, I did intend to do that but just forgot to because that's my MO this week. Will hack it up now and push a followup PR. Thanks again for you patience, I get there in the end ...

tcharding · 2025-10-03T03:13:36Z

Done in #5079

tcharding · 2025-10-03T21:56:33Z

Bad merge bot, I was hoping for a Saturday morning dopamine hit.

apoelstra · 2025-10-03T23:04:21Z

Lol! I think if I'd just merged things in a different order you'd have gotten your hit.

tcharding · 2025-10-04T03:07:45Z

Got it now!!!

Run the formatter

213d6ef

github-actions Bot added test C-consensus_encoding labels Sep 30, 2025

tcharding mentioned this pull request Sep 30, 2025

primitives: Implement decoding traits #4998

Merged

jrakibi reviewed Sep 30, 2025

View reviewed changes

apoelstra reviewed Sep 30, 2025

View reviewed changes

nyonson reviewed Sep 30, 2025

View reviewed changes

tcharding force-pushed the push-rpwqsvvtwvlu branch 2 times, most recently from 62391f4 to d72b8f7 Compare October 1, 2025 03:18

github-actions Bot added the C-internals PRs modifying the internals crate label Oct 1, 2025

tcharding force-pushed the push-rpwqsvvtwvlu branch 3 times, most recently from df8d68c to 5809b2d Compare October 1, 2025 04:59

tcharding and others added 3 commits October 1, 2025 15:11

consensus_encoding: Fix test import feature gate

cc0ceaa

Found with clippy.

consensus_encoding: Import core types

d5488bf

More terse, no loss of clarity. Import `PhantomData` and `mem`.

tcharding force-pushed the push-rpwqsvvtwvlu branch from 5809b2d to e22b6a9 Compare October 1, 2025 05:11

apoelstra reviewed Oct 1, 2025

View reviewed changes

consenus_encoding: Implement ByteVecDecoder

ea720ba

Add a decoder for decoding a byte vector. Include a single unit test that does multiple calls to `push_bytes`.

tcharding force-pushed the push-rpwqsvvtwvlu branch from e22b6a9 to a2ce3db Compare October 2, 2025 01:26

consensus_encoding: Implement VecDecoder<T>

736c7c1

tcharding force-pushed the push-rpwqsvvtwvlu branch from a2ce3db to 736c7c1 Compare October 2, 2025 01:39

tcharding mentioned this pull request Oct 2, 2025

Further prove that read_limit is correct for VecDecoder<T> #5067

Closed

apoelstra approved these changes Oct 2, 2025

View reviewed changes

tcharding mentioned this pull request Oct 3, 2025

consensus_encoding: Remove prefix_read field #5079

Closed

tcharding mentioned this pull request Oct 3, 2025

Add a new constructor to the CompactSizeDecoder #5080

Closed

apoelstra merged commit f686521 into rust-bitcoin:master Oct 3, 2025
25 checks passed

jrakibi mentioned this pull request Oct 25, 2025

consensus_encoding: implement batched allocation for vector decoders #5177

Merged

jrakibi mentioned this pull request Feb 19, 2026

consensus: Add CompactSize range check to deserialization #5697

Merged

Uh oh!

Conversation

tcharding commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcharding Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcharding commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

apoelstra commented Oct 2, 2025

Uh oh!

nyonson commented Oct 2, 2025

Uh oh!

tcharding commented Oct 3, 2025

Uh oh!

tcharding commented Oct 3, 2025

Uh oh!

tcharding commented Oct 3, 2025

Uh oh!

Uh oh!

tcharding Sep 30, 2025 •

edited

Loading