Skip to content

Conversation

@ryanofsky
Copy link
Contributor

This is a documentation-only change following up on suggestions made in the #30526 review.

Motivation for this change is that I was recently reviewing #31583, which reminded me how confusing the arithmetic blob code was and made me want to write better comments.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 2, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31596.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, Sjors, BrandonOdiwuor, achow101
Concept ACK l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@DrahtBot DrahtBot added the Docs label Jan 2, 2025
@Sjors
Copy link
Member

Sjors commented Jan 3, 2025

ACK 13464c0

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving these!
Since we can’t usually change the behavior, it’s crucial to share tribal knowledge.
I have a few suggestions for reducing duplication, simplifying sentence structure, and applying these changes to similar places.
However, I understand if you prefer to limit the scope

{RPCResult::Type::STR_HEX, "data", "transaction data encoded in hexadecimal (byte-for-byte)"},
{RPCResult::Type::STR_HEX, "txid", "transaction id encoded in little-endian hexadecimal"},
{RPCResult::Type::STR_HEX, "hash", "hash encoded in little-endian hexadecimal (including witness data)"},
{RPCResult::Type::STR_HEX, "txid", "txid hash (excluding witness data) with hash bytes reversed, encoded as hex"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documenting txid as "it's the txid hash" is a bit redundant and confusing.
In https://github.com/bitcoin/bitcoin/blob/master/src/rpc/mempool.cpp#L135 (and a few other places) the description is "The transaction hash in hex".
Could we unify the rest as well to something simple like

Suggested change
{RPCResult::Type::STR_HEX, "txid", "txid hash (excluding witness data) with hash bytes reversed, encoded as hex"},
{RPCResult::Type::STR_HEX, "txid", "transaction id hash (without witness) shown in byte-reversed hex"},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Thanks, I like "shown in byte-reversed hex" as a shorter alternative, and tried to incorporate your other ideas while not sounding ambiguous. If reviewers are happy with the new phrasing I'll add new commit changing other RPCs to use it. The other RPCs do seem ok as-is because they don't mention "little-endian hexadecimal", but it's good to be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Could we unify the rest as well

To follow up, I looked at occurrences of "txid", "wtxid", and "hash" (quoted strings) in the RPC code and found dozens of fields that could be updated to be consistent. It seems more challenging than I expected to update all the documentation, and now I'm also wondering if might actually be harmful to mention reversed bytes in some cases but not others, because this might give the wrong impression that bytes are not always reversed for every hash field.

I wonder if maybe a better approach to provide more consistency could be to introduce new RPC types like STR_TXID, and STR_WTXID and use them instead of the more generic STR_HEX type, so specific documentation about the meanings of these hashes could be incorporated into the type documentation and not repeated in field descriptions.

This is a documentation-only change following up on suggestions made in the
bitcoin#30526 review.

Motivation for this change is that I was recently reviewing bitcoin#31583, which
reminded me how confusing the arithmetic blob code was and made me want to
write better comments.
Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 13464c0 -> 3e0a992 (pr/docblob.1 -> pr/docblob.2, compare) with suggestions

Thanks for the reviews!

{RPCResult::Type::STR_HEX, "data", "transaction data encoded in hexadecimal (byte-for-byte)"},
{RPCResult::Type::STR_HEX, "txid", "transaction id encoded in little-endian hexadecimal"},
{RPCResult::Type::STR_HEX, "hash", "hash encoded in little-endian hexadecimal (including witness data)"},
{RPCResult::Type::STR_HEX, "txid", "txid hash (excluding witness data) with hash bytes reversed, encoded as hex"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Thanks, I like "shown in byte-reversed hex" as a shorter alternative, and tried to incorporate your other ideas while not sounding ambiguous. If reviewers are happy with the new phrasing I'll add new commit changing other RPCs to use it. The other RPCs do seem ok as-is because they don't mention "little-endian hexadecimal", but it's good to be consistent.

/** Get the hash of these transactions' wtxids, concatenated in lexicographical order (treating the
* wtxids as little endian encoded uint256, smallest to largest). */
/** Get the hash of the concatenated wtxids of transactions, with wtxids
* treated as a little-endian numbers and sorted in ascending numeric order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* treated as a little-endian numbers and sorted in ascending numeric order.
* treated as little-endian numbers and sorted in ascending numeric order.

Copy link
Contributor Author

@ryanofsky ryanofsky Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Thanks, fixed in a followup branch.

Copy link
Contributor

@sedited sedited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 3e0a992

{RPCResult::Type::STR_HEX, "txid", "transaction id encoded in little-endian hexadecimal"},
{RPCResult::Type::STR_HEX, "hash", "hash encoded in little-endian hexadecimal (including witness data)"},
{RPCResult::Type::STR_HEX, "txid", "transaction hash excluding witness data, shown in byte-reversed hex"},
{RPCResult::Type::STR_HEX, "hash", "transaction hash including witness data, shown in byte-reversed hex"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Call it hexadecimal instead of hex like done in the "data" field for better consistency?

Copy link
Contributor Author

@ryanofsky ryanofsky Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Nit: Call it hexadecimal instead of hex like done in the "data" field for better consistency?

I would probably prefer to go the other way since the term "hex" occurs more frequently than "hexadecimal" in RPC code and documentation (115 vs 11 times). I'm hoping we might be able to improve consistency by moving this information from descriptions of RPC fields to descriptions of RPC types and left another comment about this.

@DrahtBot DrahtBot requested a review from Sjors January 6, 2025 09:34
Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 3e0a992

@DrahtBot DrahtBot removed the CI failed label Jan 6, 2025
Copy link
Contributor

@BrandonOdiwuor BrandonOdiwuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ACK 3e0a992

@achow101
Copy link
Member

achow101 commented Jan 6, 2025

ACK 3e0a992

@l0rinc
Copy link
Contributor

l0rinc commented Jan 6, 2025

There's still a typo there, could we fix it before merging? Otherwise I'd ACK as well.

@achow101 achow101 merged commit 433412f into bitcoin:master Jan 6, 2025
18 checks passed
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 7, 2025
Followups from bitcoin#31596

- Fix typo in packages.h that makes comment harder to understand
- Refer explicitly to "wtxid" in description of relevant hash field so readers
  can look up the term, and update "txid" description as well to be consistent.
- Uses dashes consistently when writing "little-endian", "most-significant",
  etc.
Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews. Since there are more followups that could be made here, I created a branch to keep track of them. I don't want to create too much churn with documentation PRs so I won't open another one right away, but I can implement any suggestions made here, open a pr if requested, or wait for a related PR to be opened to attach these changes to.

/** Get the hash of these transactions' wtxids, concatenated in lexicographical order (treating the
* wtxids as little endian encoded uint256, smallest to largest). */
/** Get the hash of the concatenated wtxids of transactions, with wtxids
* treated as a little-endian numbers and sorted in ascending numeric order.
Copy link
Contributor Author

@ryanofsky ryanofsky Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Thanks, fixed in a followup branch.

{RPCResult::Type::STR_HEX, "txid", "transaction id encoded in little-endian hexadecimal"},
{RPCResult::Type::STR_HEX, "hash", "hash encoded in little-endian hexadecimal (including witness data)"},
{RPCResult::Type::STR_HEX, "txid", "transaction hash excluding witness data, shown in byte-reversed hex"},
{RPCResult::Type::STR_HEX, "hash", "transaction hash including witness data, shown in byte-reversed hex"},
Copy link
Contributor Author

@ryanofsky ryanofsky Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Nit: Call it hexadecimal instead of hex like done in the "data" field for better consistency?

I would probably prefer to go the other way since the term "hex" occurs more frequently than "hexadecimal" in RPC code and documentation (115 vs 11 times). I'm hoping we might be able to improve consistency by moving this information from descriptions of RPC fields to descriptions of RPC types and left another comment about this.

{RPCResult::Type::STR_HEX, "data", "transaction data encoded in hexadecimal (byte-for-byte)"},
{RPCResult::Type::STR_HEX, "txid", "transaction id encoded in little-endian hexadecimal"},
{RPCResult::Type::STR_HEX, "hash", "hash encoded in little-endian hexadecimal (including witness data)"},
{RPCResult::Type::STR_HEX, "txid", "txid hash (excluding witness data) with hash bytes reversed, encoded as hex"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31596 (comment)

Could we unify the rest as well

To follow up, I looked at occurrences of "txid", "wtxid", and "hash" (quoted strings) in the RPC code and found dozens of fields that could be updated to be consistent. It seems more challenging than I expected to update all the documentation, and now I'm also wondering if might actually be harmful to mention reversed bytes in some cases but not others, because this might give the wrong impression that bytes are not always reversed for every hash field.

I wonder if maybe a better approach to provide more consistency could be to introduce new RPC types like STR_TXID, and STR_WTXID and use them instead of the more generic STR_HEX type, so specific documentation about the meanings of these hashes could be incorporated into the type documentation and not repeated in field descriptions.