Follow-ups to Bech32 error detection #23577

meshcollider · 2021-11-23T04:07:06Z

A number of follow-ups and improvements to the bech32 error location code, introduced in #16807.

Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

meshcollider · 2021-11-23T04:08:00Z

@ryanofsky and @sipa, this addresses most of your review comments in #16807 so you may like to take a look, thanks!

laanwj · 2021-11-24T10:15:30Z

Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

Wow, this makes the compiler generate the table? Agree this is better for understandability, just wonder, does this have any noticeable impact on compilation time?

meshcollider · 2021-11-24T21:26:50Z

@laanwj Wow, this makes the compiler generate the table? Agree this is better for understandability, just wonder, does this have any noticeable impact on compilation time?

It shouldn't do, each element of the table only takes a few bitwise operations on 16 bit integers and the table is only 1024 elements.

EDIT: a quick test compiling that file using clang takes 0.24 on master and 0.28 with this PR so it is pretty negligible overall.

laanwj · 2021-11-29T13:42:23Z

I checked the assembly output from clang 13.0.0 and gcc 9.3.0 that the tables are indeed a) generated constants in .rodata and b) different from before (as expected and mentioned in the commit description) c) match between compilers.

	.type	_ZN6bech3212_GLOBAL__N_110GF1024_LOGE,@object # @_ZN6bech3212_GLOBAL__N_110GF1024_LOGE
	.section	.rodata,"a",@progbits
	.p2align	1
_ZN6bech3212_GLOBAL__N_110GF1024_LOGE:
	.short	65535                           # 0xffff
	.short	0                               # 0x0
	.short	99                              # 0x63
	.short	363                             # 0x16b
	.short	198                             # 0xc6
⋮
	.type	_ZN6bech3212_GLOBAL__N_110GF1024_EXPE,@object # @_ZN6bech3212_GLOBAL__N_110GF1024_EXPE
	.p2align	1
_ZN6bech3212_GLOBAL__N_110GF1024_EXPE:
	.short	1                               # 0x1
	.short	32                              # 0x20
	.short	311                             # 0x137
	.short	139                             # 0x8b
	.short	206                             # 0xce
⋮

Full values for both tables can be found here: https://0bin.net/paste/Lp6hKmVD#EwTuc1KFIDzy57g0g3EvtQiLIj3HWD4e+OGPWUJTwQ+ (I did not have anything to check them against).

maflcko · 2021-11-29T13:46:27Z

(I did not have anything to check them against).

Looks like the GF1024_EXP you posted differs from the one that is removed in this pull?

sipa · 2021-11-29T13:55:34Z

I will review in detail shortly, but I suspect that @meshcollider perhaps included the table changes corresponding to sipa/bech32#64 here? There is Sage code there that generates the tables too, if someone wants something to compare against.

laanwj · 2021-11-29T13:56:33Z

Looks like the GF1024_EXP you posted differs from the one that is removed in this pull?

Both tables are. From the commit:

Note that the tables generated by this code are different to the previous
hardcoded tables, because we simplify to encoding (e) as 1 || 0 rather
than 9 || 15 (as was done in PR 64 of the bech32 repo).

I began this test with the intent to compare if they were the same, then noticed they were not. But it's expected (they use a different encoding).

Edit: I'll check against @sipa's sage output.
Edit2: All values match. The only difference is that the size of GF1024_EXP printed by the Sage script is 1024, while the structure here is 1023 long.

meshcollider · 2021-11-29T20:52:25Z

I'm happy to split that commit into two, the first deriving the existing table and then the second modifying to the new table, if that would help review?

The length difference (1024 vs 1023) is just because the sage code also includes (a)^1023 = 1 = (a)^0.

src/bech32.cpp

doc/release-notes-16807.md

src/bech32.cpp

sipa · 2021-11-29T22:48:54Z

ACK 98124a63946d6f7d665864adbef35a863423ca47, with or without nits.

src/bech32.cpp

meshcollider · 2021-11-30T00:41:41Z

Updated to address all of @sipa's nits, and split the GF1024 constexpr commit into two:

the first keeps the table the same, and generates it at compile time.
the second modifies the generation to use the simpler encoding from Canonically encode powers of primitive element e sipa/bech32#64

laanwj · 2021-11-30T15:19:37Z

src/bech32.cpp

After the recent push I cannot find GF1024_EXP and GF1024_LOG as symbols anymore in the compiled output. I think that makes sense as they're references to tables now, which is fine, just need to adapt my comparison script.

laanwj · 2021-11-30T15:43:32Z

Thanks, re-checked:

gcc and clang output for tables of 1e24b1d385d64892389985afc058c83410c44311 matches the output of the sage script
gcc and clang output for tables of 83dc0b6392a99ea99e2d67d6893d715e48340023 matches GF1024_EXP + GF1024_LOG of the commit before it

Code review ACK 1e24b1d385d64892389985afc058c83410c44311

gruve-p · 2021-11-30T15:51:18Z

ACK 1e24b1d

src/bech32.cpp

sipa · 2021-11-30T15:59:21Z

Verified the generated tables by adding the assertions generated by this Sage code:

# Binary field
B.<b> = GF(2)
# Polynomials over the binary field
BP.<bp> = B[]
# Encoder field GF(32)
F.<f> = GF(32, modulus=bp^5 + bp^3 + 1, repr='int')
# Polynomials over the encoder field
FP.<fp> = F[]
# Decoder field GF(1024)
E.<e> = F.extension(modulus=fp^2 + F.fetch_int(9)*fp + F.fetch_int(23))

# Convert an E field element to an integer 0..1023.
def gf_to_int(v):
    return v[0].integer_representation() + 32*v[1].integer_representation()

# Convert an integer 0..1023 to an E field element.
def int_to_gf(v):
    return F.fetch_int(v & 31) + e*F.fetch_int(v >> 5)

GF1024_EXP = [gf_to_int(e**i) for i in range(1023)]
GF1024_LOG = [-1] + [discrete_log(int_to_gf(i), e, ord=1023, operation="*") for i in range(1, 1024)]

for i in range(1023):
    print("static_assert(GF1024_EXP[%i] == %i);" % (i, GF1024_EXP[i]))
for i in range(1024):
    print("static_assert(GF1024_LOG[%i] == %i);" % (i, GF1024_LOG[i]))

SYNDROME_CONSTS = [None for _ in range(25)]
a1, a2, a3 = e**997, e**998, e**999
for k in range(1, 6):
    for b in range(5):
        c = (gf_to_int(a1**k * int_to_gf(2)**b) |
             gf_to_int(a2**k * int_to_gf(2)**b) << 10 |
             gf_to_int(a3**k * int_to_gf(2)**b) << 20)
        SYNDROME_CONSTS[(k-1)*5 + b] = c

for i in range(25):
    print("static_assert(SYNDROME_CONSTS[%i] == %i);" % (i, SYNDROME_CONSTS[i]))

…ted constexprs.

This follows PR 64 of the sipa/bech32 repo.

meshcollider · 2021-11-30T22:03:41Z

Only change:

- constexpr const std::array<uint32_t, 25>& SYNDROME_CONSTS = GenerateSyndromeConstants();
+ constexpr std::array<uint32_t, 25> SYNDROME_CONSTS = GenerateSyndromeConstants();

as per @sipa's comment.

sipa · 2021-11-30T23:12:09Z

ACK 2fa4fd1, verified the tables by inserting assertions generated by a Sage script.

ryanofsky

Code review ACK 2fa4fd1 with caveat that I didn't try to understand or verify the math here, I just checked that all comments describing the math matched up with the code, and that all the changes seemed like improvements.

src/bech32.cpp

gruve-p · 2021-12-05T14:07:24Z

ACK 353b463

laanwj · 2021-12-06T09:41:04Z

Re-ACK a4fe701

maflcko

left a nit

maflcko · 2021-12-06T13:09:25Z

src/bech32.cpp


 /** Find index of an incorrect character in a Bech32 string. */
-std::string LocateErrors(const std::string& str, std::vector<int>& error_locations) {
+std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {


The docstring reads odd in doxygen:

"Find index of an incorrect character in a Bech32 string. Return the positions of errors in a Bech32 string. "

See https://doxygen.bitcoincore.org/namespacebech32.html#ad3d7336317a6e6d359b54b2f60b53acc

Also, std::make_pair is not needed with C++11/17

diff --git a/src/bech32.cpp b/src/bech32.cpp index 3cda1dfff5..abc617467d 100644 --- a/src/bech32.cpp +++ b/src/bech32.cpp @@ -111,7 +111,7 @@ constexpr std::pair<std::array<int16_t, 1023>, std::array<int16_t, 1024>> Genera GF1024_LOG[v] = i; } - return std::make_pair(GF1024_EXP, GF1024_LOG); + return {GF1024_EXP, GF1024_LOG}; } constexpr auto tables = GenerateGFTables(); @@ -395,27 +395,27 @@ DecodeResult Decode(const std::string& str) { return {result, std::move(hrp), data(values.begin(), values.end() - 6)}; } -/** Find index of an incorrect character in a Bech32 string. */ -std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) { +std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) +{ std::vector<int> error_locations{}; if (str.size() > 90) { error_locations.resize(str.size() - 90); std::iota(error_locations.begin(), error_locations.end(), 90); - return std::make_pair("Bech32 string too long", std::move(error_locations)); + return {"Bech32 string too long", std::move(error_locations)}; } if (!CheckCharacters(str, error_locations)){ - return std::make_pair("Invalid character or mixed case", std::move(error_locations)); + return {"Invalid character or mixed case", std::move(error_locations)}; } size_t pos = str.rfind('1'); if (pos == str.npos) { - return std::make_pair("Missing separator", std::vector<int>{}); + return {"Missing separator", std::vector<int>{}}; } if (pos == 0 || pos + 7 > str.size()) { error_locations.push_back(pos); - return std::make_pair("Invalid separator position", std::move(error_locations)); + return {"Invalid separator position", std::move(error_locations)}; } std::string hrp; @@ -430,7 +430,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) { int8_t rev = CHARSET_REV[c]; if (rev == -1) { error_locations.push_back(i); - return std::make_pair("Invalid Base 32 character", std::move(error_locations)); + return {"Invalid Base 32 character", std::move(error_locations)}; } values[i - pos - 1] = rev; } @@ -550,7 +550,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) { } } else { // No errors - return std::make_pair("", std::vector<int>{}); + return {"", std::vector<int>{}}; } if (error_locations.empty() || (!possible_errors.empty() && possible_errors.size() < error_locations.size())) { @@ -562,7 +562,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) { : error_encoding == Encoding::BECH32 ? "Invalid Bech32 checksum" : "Invalid checksum"; - return std::make_pair(error_message, std::move(error_locations)); + return {error_message, std::move(error_locations)}; } } // namespace bech32 diff --git a/src/bech32.h b/src/bech32.h index 5e89e6efda..707299f19e 100644 --- a/src/bech32.h +++ b/src/bech32.h @@ -45,7 +45,7 @@ struct DecodeResult /** Decode a Bech32 or Bech32m string. */ DecodeResult Decode(const std::string& str); -/** Return the positions of errors in a Bech32 string. */ +/** Return the positions of errors in a Bech32 string. (Empty string if no errors can be found). */ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str); } // namespace bech32

maflcko · 2021-12-07T13:56:12Z

More nits:

There are two lines missing test coverage: https://marcofalke.github.io/btc_cov/total.coverage/src/bech32.cpp.gcov.html#542
Maybe update the DecodeDestination fuzz test with error locations?

a4fe701 Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson) 2fa4fd1 Use std::iota instead of manually pushing range (Samuel Dobson) 405c96f Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson) 28d9c28 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson) 14358a0 Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson) 63f7b69 Update release note for bech32 error detection (Samuel Dobson) c8b9a22 Report encoding type in bech32 error message (Samuel Dobson) 92f0caf Improve Bech32 boost tests (Samuel Dobson) bb4d3e9 Address review comments for Bech32 error validation (Samuel Dobson) Pull request description: A number of follow-ups and improvements to the bech32 error location code, introduced in bitcoin#16807. Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation. ACKs for top commit: laanwj: Re-ACK a4fe701 Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276

f5e22de Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson) a94d838 Use std::iota instead of manually pushing range (Samuel Dobson) e32e83d Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson) af1c914 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson) a85037e Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson) 9ac54d3 Update release note for bech32 error detection (Samuel Dobson) 5fb7801 Report encoding type in bech32 error message (Samuel Dobson) 1fc2945 Improve Bech32 boost tests (Samuel Dobson) e242dd3 Address review comments for Bech32 error validation (Samuel Dobson) Pull request description: A number of follow-ups and improvements to the bech32 error location code, introduced in #16807. Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation. ACKs for top commit: laanwj: Re-ACK f5e22de Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276

meshcollider added 3 commits November 23, 2021 15:48

Address review comments for Bech32 error validation

bb4d3e9

Improve Bech32 boost tests

92f0caf

Report encoding type in bech32 error message

c8b9a22

meshcollider added RPC/REST/ZMQ Wallet labels Nov 23, 2021

maflcko added this to the 23.0 milestone Nov 25, 2021

sipa reviewed Nov 29, 2021

View reviewed changes