Skip to content

Conversation

@apoelstra
Copy link
Member

Thanks to @roconnor-blockstream for writing the first draft of this document.

@apoelstra apoelstra force-pushed the 2020-12--sighash-md branch 4 times, most recently from b4792d3 to a7de102 Compare December 2, 2020 18:12
*** ''sha_single_output'' (32): the SHA256 of the corresponding output in <code>CTxOut</code> format.
*** '''''NEW''''' ''sha_single_output_witness'' (32): the SHA256 of the serialization of the corresponding output witnesses (rangeproof and surjection proof) in <code>CTxWitness</code> format.

The total length of ''SigMsg()'' is at most ''491'' bytes. Note that this does not include the size of sub-hashes such as ''sha_prevouts'', which may be cached across signatures of the same transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general point for discussion. (Fixed length sighash with positions for fields)

  • We observed that having a fixed length sighash with a fixed location for which field occurs where is super helpful in creating complex scripts easily.
  • Even though we will new opcodes for getting these directly, it might still be useful to have a structured sighash.
  • The only fields with variable length are assetIssuance and nValue( assetIssuance internally uses confidential value serialization). So, everything can be addressed if we fix the serialization of explicit in nValue.

Note that we don't need to change the way serialization is done for these fields in p2p network, but only when we do sighash computation.
Downsides:

  • more code complexity
  • more divergence from core

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add zeros to the hash after nNonce nAmount and nAsset such that they'd always total out to 33 bytes. And same for the fields in the issuance.

I don't think this is too much added complexity and it would give you, as you say, fixed-offset fields in the sighash. I can certainly imagine that being useful for a covenant which is trying to fix one piece of the sighash while not touching anything else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still have a variable-length encoding based on the sighash flag, and I don't think there's any reasonable way to avoid that (the different flags mean you're hashing different data). But it's easy to fix/check this flag since it appears on the siganture itself, not only inside the sighash.

Do you think we should 0-pad so that the issuance data field is always 130 bytes even if there is no issuance? This adds two full sha2 compressions (at about 200ns/shot on a desktop CPU) and will rarely be nonzero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should compare this overhead to normal signature validation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still want to maintain length index prefixing, we could change the order in which we do the sighash. We could all the input specific things at the end of the sighash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal signature validation is 50us. So an added .5us is a 1% performance hit.

I'd rather not change the order of the sighash too much to minimize the cognitive load for people coming from Bitcoin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also variable length introduced by the presence/absence of the annex. If you are interested in resolving that, then you should pad the annex hash in the case that there is no annex.

*** '''''NEW''''' ''nValue'' (9--33): (possibly confidential) amount of the previous output spent by this input.
*** ''scriptPubKey'' (35): ''scriptPubKey'' of the previous output spent by this input, serialized as script inside <code>CTxOut</code>. Its size is always 35 bytes.
*** ''nSequence'' (4): ''nSequence'' of this input.
*** '''''NEW''''' ''asset_issuance'' (0 or 74--130): if ''outpoint_flag & 0x80 == 0x80'': asset issuance data of this input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (82--130) instead of (74--130)? The minimum size is 2 explicit values + 2 32 bytes = 18 + 62 = 82

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inflationKeys field can be null, in which case it is one byte.

Copy link
Contributor

@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support sidechains to sidechains? I was not aware of this. If not, what is parent_pegin_asset and how is that supposed to be used?

@apoelstra
Copy link
Member Author

Yes, we support sidechains to sidechains, and I believe some of our functional tests depend on this fact.

@apoelstra
Copy link
Member Author

@sanket1729 updated doc to make confidential fields all fixed-length, including issuance which is now always 130 bytes.

I don't think we care about 1% perf hit when validating ANYONECANPAY signatures. If we can get a taproot sig in a blockheader, that saves us 10 signatures' worth of validation which means that even if the block has 100 such inputs (unlikely, nobody uses ACP :)) Taproot will still be a net win.

@roconnor-blockstream
Copy link
Contributor

the nValue is in big endian, so it should probably get 0 byte prefixes instead of suffixes.

@apoelstra
Copy link
Member Author

Like, between the 01 byte indicating that it is explict, and the 8 value bytes? I think that would complicate the mapping between the normal format and the extended one.

@roconnor-blockstream
Copy link
Contributor

Good point.

@apoelstra apoelstra force-pushed the 2020-12--sighash-md branch from 3b960ba to 169af2d Compare May 20, 2021 17:46
@apoelstra
Copy link
Member Author

Rebased. What needs to be done to get this in?

@sanket1729
Copy link
Contributor

ACK 169af2d

@apoelstra
Copy link
Member Author

Removed the input nNonce field and cleaned up the ambiguity that @roconnor-blockstream identified. Good to merge now?

@apoelstra apoelstra force-pushed the 2020-12--sighash-md branch from d49aab3 to f18fa5c Compare June 8, 2021 01:03
** ''nVersion'' (4): the ''nVersion'' of the transaction.
** ''nLockTime'' (4): the ''nLockTime'' of the transaction.
** If the ''hash_type & 0x80'' does not equal <code>SIGHASH_ANYONECANPAY</code>:
*** '''''NEW''''' ''sha_outpoint_flags'' (32): the SHA256 of the serialization of the concatenation of one byte per input of the input's outpoint flags shifted right by 24 bits. (The byte for an pegin input would be 0x40. The byte for an issuance input would be 0x80. The byte for both a pegin and issuance would be 0xc0.)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should replace this "shifted by 24 bits" language with an explicit list of cases. In both Elements Core and rust-elements, the prevout stored internally does not actually have these bits set; they are extracted during deserialization and recorded separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However you want is fine. I phrased it this way to be forward compatible with new flags that might be added in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds reasonable. Let's keep the existing text.

The function ''SigMsg(hash_type, ext_flag)'' computes the message being signed as a byte array. It is implicitly also a function of the spending transaction and the outputs it spends, but these are not listed to keep notation simple.

The parameter ''hash_type'' is an 8-bit unsigned value. The <code>SIGHASH</code> encodings from the legacy script system are reused, including <code>SIGHASH_ALL</code>, <code>SIGHASH_NONE</code>, <code>SIGHASH_SINGLE</code>, and <code>SIGHASH_ANYONECANPAY</code>, plus the default ''hash_type'' value ''0x00'' which results in signing over the whole transaction just as for <code>SIGHASH_ALL</code>. The following restrictions apply, which cause validation failure if violated:
* Using any undefined ''hash_type'' (not ''0x00'', ''0x01'', ''0x02'', ''0x03'', ''0x81'', ''0x82'', or ''0x83''<ref>'''Why reject unknown ''hash_type'' values?''' By doing so, it is easier to reason about the worst case amount of signature hashing an implementation with adequate caching must perform.</ref>).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the <ref>...<ref> here and below. Alternative we could add a reference section below, but I don't think that is preferable.

* Using any undefined ''hash_type'' (not ''0x00'', ''0x01'', ''0x02'', ''0x03'', ''0x81'', ''0x82'', or ''0x83''<ref>'''Why reject unknown ''hash_type'' values?''' By doing so, it is easier to reason about the worst case amount of signature hashing an implementation with adequate caching must perform.</ref>).
* Using <code>SIGHASH_SINGLE</code> without a "corresponding output" (an output with the same index as the input being verified).

'''''NEW''''' If the input under consideration is a pegin input, the fields ''nAsset'', ''nValue'' and ''scriptPubKey'', where they appear, are taken from the pegin witness data. The asset used is the asset ID on the sidechain, not that on the parent chain (if any).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferable for all these NEW statements to be their own paragraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'm sure that was my intent but I guess I didn't look at the rendered version too carefully.

** ''nLockTime'' (4): the ''nLockTime'' of the transaction.
** If the ''hash_type & 0x80'' does not equal <code>SIGHASH_ANYONECANPAY</code>:
*** '''''NEW''''' ''sha_outpoint_flags'' (32): the SHA256 of the serialization of the concatenation of one byte per input of the input's outpoint flags shifted right by 24 bits. (The byte for an pegin input would be 0x40. The byte for an issuance input would be 0x80. The byte for both a pegin and issuance would be 0xc0.)
*** ''sha_prevouts'' (32): the SHA256 of the serialization of all input outpoints. Each prevout output index is the masked version containing outpoint flags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masked version excluding outpoint flags.

** ''spend_type'' (1): equal to ''(ext_flag * 2) + annex_present'', where ''annex_present'' is 0 if no annex is present, or 1 otherwise (the original witness stack has two or more witness elements, and the first byte of the last element is ''0x50'')
** If ''hash_type & 0x80'' equals <code>SIGHASH_ANYONECANPAY</code>:
*** '''''NEW''''' ''outpoint_flag'' (1): the input's outpoint flags shifted right by 24 bits. (Compare ''sha_outpoint_flags'' above.)
*** ''outpoint'' (36): the <code>COutPoint</code> of this input (32-byte hash + 4-byte little-endian) where the output index includes the outpoint flags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output index excludes the outpoint flags.

*** ''sha_annex'' (32): the SHA256 of ''(compact_size(size of annex) || annex)'', where ''annex'' includes the mandatory ''0x50'' prefix.
* Data about this output:
** If ''hash_type & 3'' equals <code>SIGHASH_SINGLE</code>:
*** ''sha_single_output'' (32): the SHA256 of the corresponding output in extended <code>CTxOut</code> format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a (NEW) before the word extended.

*** ''sha_sequences'' (32): the SHA256 of the serialization of all input ''nSequence''.
*** '''''NEW''''' ''sha_issuances'' (32): the SHA256 of the serialization of the concatenation of all input ''assetIssuance'' or 130 '0x00' bytes for inputs with no issuance
** If ''hash_type & 3'' does not equal <code>SIGHASH_NONE</code> or <code>SIGHASH_SINGLE</code>:
*** ''sha_outputs'' (32): the SHA256 of the serialization of all outputs in extended <code>CTxOut</code> format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a (NEW) before the word extended.


'''''NEW''''' If the input under consideration is a pegin input, the fields ''nAsset'', ''nValue'' and ''scriptPubKey'', where they appear, are taken from the pegin witness data. The asset used is the asset ID on the sidechain, not that on the parent chain (if any).
'''''NEW''''' The epoch field prepended before signature hash is completely dropped. If there are new updates to the taproot signature hashes in elements, they will use new tagged hashes instead of incrementing epochs
'''''NEW''''' The fields ''nNonce'', ''nAsset'' and ''nValue'' are serialized in a fixed-length format, which consists of their ordinary 1/9/33 byte encoding followed by sufficiently many 0 to pad the length out to 33. When we refer to ''extended <code>CTxOut</code> format'' below, we mean the fields ''nAsset'', ''nValue'', ''nNonce'' serialized in that order in fixed-length format, followed by the ordinary length-prefixed ''scriptPubKey''.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might prefer a term like "padded CTxOut format" or something rather than "extended". To me "extended" conveys that it has more fields. However it isn't a big deal.

@apoelstra
Copy link
Member Author

Added a commit to address all of @roconnor-blockstream comments

** If ''hash_type & 0x80'' equals <code>SIGHASH_ANYONECANPAY</code>:
*** '''''NEW''''' ''outpoint_flag'' (1): the input's outpoint flags shifted right by 24 bits. (Compare ''sha_outpoint_flags'' above.)
*** ''outpoint'' (36): the <code>COutPoint</code> of this input (32-byte hash + 4-byte little-endian) where the output index excludes the outpoint flags.
*** '''''NEW''''' ''nAsset'' (33): (possibly confidential) assetID of the previous output spent by this input, in fixed-length format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight preference for this to read ", in padded format". And below too.

*** ''sha_single_output'' (32): the SHA256 of the corresponding output in '''NEW''' padded <code>CTxOut</code> format.
*** '''''NEW''''' ''sha_single_output_witness'' (32): the SHA256 of the serialization of the corresponding output witnesses (rangeproof and surjection proof) in <code>CTxOutWitness</code> format.

The total length of ''SigMsg()'' is ''443'' bytes for <code>ANYONECANPAY</code> sighashes, ''366'' bytes for non-<code>ANYONECANPAY</code> sighashes, and both numbers are reduced by 64 bytes for <code>SIGHASH_NONE</code> sighashes. Note that this does not include the size of sub-hashes such as ''sha_prevouts'', which may be cached across signatures of the same transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I counted 442 bytes. Anyone want to count again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not updated with the latest numbers after we removed nNonce. Recalculating the numbers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 410/334 for ACP/non-ACP according to my calculations/implementation. Are you taking into consideration the 32-byte annex too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was including the annex.

@apoelstra
Copy link
Member Author

As per offline conversation, added two commits to

  • clarify the story with pegin inputs
  • remove the padded fields, since we don't really need them in light of our planned transaction introspection opcodes

I haven't updated the total length arithmetic. Would probably be easiest to do this after it's implemented by poking at the Python implementation. @sanket1729 ?

@apoelstra
Copy link
Member Author

(fuzz test failure in PSBT module which will be shortly replaced; just kicked the CI job)

@apoelstra apoelstra force-pushed the 2020-12--sighash-md branch from ec4b426 to cfe5ebb Compare June 18, 2021 20:53
@apoelstra apoelstra force-pushed the 2020-12--sighash-md branch from cfe5ebb to 7ca7983 Compare July 1, 2021 17:00
@apoelstra
Copy link
Member Author

Rebased. CI should pass now.

cc @roconnor-blockstream can I have an ack?

@sanket1729
Copy link
Contributor

haven't updated the total length arithmetic. Would probably be easiest to do this after it's implemented by poking at the Python implementation. @sanket1729 ?

@apoelstra
https://github.com/sanket1729/elements/blob/ea3f5eda5caacc1e4a86908728119eac61a8187f/test/functional/test_framework/script.py#L865-L872

@apoelstra
Copy link
Member Author

Fixed the doc, but @sanket1729 I notice that your Python (and C++) impl completely ignore SIGHASH_NONE!

@sanket1729
Copy link
Contributor

sanket1729 commented Jul 1, 2021

(out_type != SIGHASH_ALL and out_type != SIGHASH_SINGLE)

is the same as out_type == SIGHHASH_NONE

@roconnor-blockstream
Copy link
Contributor

LGTM

Copy link
Contributor

@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK cca8224

@apoelstra apoelstra merged commit e0532f6 into ElementsProject:master Jul 7, 2021
@apoelstra apoelstra deleted the 2020-12--sighash-md branch July 7, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants