Skip to content

Conversation

@meshcollider
Copy link
Contributor

A number of follow-ups and improvements to the bech32 error location code, introduced in #16807.

Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

@meshcollider
Copy link
Contributor Author

@ryanofsky and @sipa, this addresses most of your review comments in #16807 so you may like to take a look, thanks!

@laanwj
Copy link
Member

laanwj commented Nov 24, 2021

Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

Wow, this makes the compiler generate the table? Agree this is better for understandability, just wonder, does this have any noticeable impact on compilation time?

@meshcollider
Copy link
Contributor Author

meshcollider commented Nov 24, 2021

@laanwj Wow, this makes the compiler generate the table? Agree this is better for understandability, just wonder, does this have any noticeable impact on compilation time?

It shouldn't do, each element of the table only takes a few bitwise operations on 16 bit integers and the table is only 1024 elements.

EDIT: a quick test compiling that file using clang takes 0.24 on master and 0.28 with this PR so it is pretty negligible overall.

@maflcko maflcko added this to the 23.0 milestone Nov 25, 2021
@laanwj
Copy link
Member

laanwj commented Nov 29, 2021

I checked the assembly output from clang 13.0.0 and gcc 9.3.0 that the tables are indeed a) generated constants in .rodata and b) different from before (as expected and mentioned in the commit description) c) match between compilers.

	.type	_ZN6bech3212_GLOBAL__N_110GF1024_LOGE,@object # @_ZN6bech3212_GLOBAL__N_110GF1024_LOGE
	.section	.rodata,"a",@progbits
	.p2align	1
_ZN6bech3212_GLOBAL__N_110GF1024_LOGE:
	.short	65535                           # 0xffff
	.short	0                               # 0x0
	.short	99                              # 0x63
	.short	363                             # 0x16b
	.short	198                             # 0xc6
⋮
	.type	_ZN6bech3212_GLOBAL__N_110GF1024_EXPE,@object # @_ZN6bech3212_GLOBAL__N_110GF1024_EXPE
	.p2align	1
_ZN6bech3212_GLOBAL__N_110GF1024_EXPE:
	.short	1                               # 0x1
	.short	32                              # 0x20
	.short	311                             # 0x137
	.short	139                             # 0x8b
	.short	206                             # 0xce
⋮

Full values for both tables can be found here: https://0bin.net/paste/Lp6hKmVD#EwTuc1KFIDzy57g0g3EvtQiLIj3HWD4e+OGPWUJTwQ+ (I did not have anything to check them against).

@maflcko
Copy link
Member

maflcko commented Nov 29, 2021

(I did not have anything to check them against).

Looks like the GF1024_EXP you posted differs from the one that is removed in this pull?

@sipa
Copy link
Member

sipa commented Nov 29, 2021

I will review in detail shortly, but I suspect that @meshcollider perhaps included the table changes corresponding to sipa/bech32#64 here? There is Sage code there that generates the tables too, if someone wants something to compare against.

@laanwj
Copy link
Member

laanwj commented Nov 29, 2021

Looks like the GF1024_EXP you posted differs from the one that is removed in this pull?

Both tables are. From the commit:

Note that the tables generated by this code are different to the previous
hardcoded tables, because we simplify to encoding (e) as 1 || 0 rather
than 9 || 15 (as was done in PR 64 of the bech32 repo).

I began this test with the intent to compare if they were the same, then noticed they were not. But it's expected (they use a different encoding).

Edit: I'll check against @sipa's sage output.
Edit2: All values match. The only difference is that the size of GF1024_EXP printed by the Sage script is 1024, while the structure here is 1023 long.

@meshcollider
Copy link
Contributor Author

meshcollider commented Nov 29, 2021

I'm happy to split that commit into two, the first deriving the existing table and then the second modifying to the new table, if that would help review?

The length difference (1024 vs 1023) is just because the sage code also includes (a)^1023 = 1 = (a)^0.

@sipa
Copy link
Member

sipa commented Nov 29, 2021

ACK 98124a63946d6f7d665864adbef35a863423ca47, with or without nits.

@meshcollider
Copy link
Contributor Author

Updated to address all of @sipa's nits, and split the GF1024 constexpr commit into two:

src/bech32.cpp Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the recent push I cannot find GF1024_EXP and GF1024_LOG as symbols anymore in the compiled output. I think that makes sense as they're references to tables now, which is fine, just need to adapt my comparison script.

@laanwj
Copy link
Member

laanwj commented Nov 30, 2021

Thanks, re-checked:

  • gcc and clang output for tables of 1e24b1d385d64892389985afc058c83410c44311 matches the output of the sage script
  • gcc and clang output for tables of 83dc0b6392a99ea99e2d67d6893d715e48340023 matches GF1024_EXP + GF1024_LOG of the commit before it

Code review ACK 1e24b1d385d64892389985afc058c83410c44311

@gruve-p
Copy link
Contributor

gruve-p commented Nov 30, 2021

ACK 1e24b1d

@sipa
Copy link
Member

sipa commented Nov 30, 2021

Verified the generated tables by adding the assertions generated by this Sage code:

# Binary field
B.<b> = GF(2)
# Polynomials over the binary field
BP.<bp> = B[]
# Encoder field GF(32)
F.<f> = GF(32, modulus=bp^5 + bp^3 + 1, repr='int')
# Polynomials over the encoder field
FP.<fp> = F[]
# Decoder field GF(1024)
E.<e> = F.extension(modulus=fp^2 + F.fetch_int(9)*fp + F.fetch_int(23))

# Convert an E field element to an integer 0..1023.
def gf_to_int(v):
    return v[0].integer_representation() + 32*v[1].integer_representation()

# Convert an integer 0..1023 to an E field element.
def int_to_gf(v):
    return F.fetch_int(v & 31) + e*F.fetch_int(v >> 5)

GF1024_EXP = [gf_to_int(e**i) for i in range(1023)]
GF1024_LOG = [-1] + [discrete_log(int_to_gf(i), e, ord=1023, operation="*") for i in range(1, 1024)]

for i in range(1023):
    print("static_assert(GF1024_EXP[%i] == %i);" % (i, GF1024_EXP[i]))
for i in range(1024):
    print("static_assert(GF1024_LOG[%i] == %i);" % (i, GF1024_LOG[i]))

SYNDROME_CONSTS = [None for _ in range(25)]
a1, a2, a3 = e**997, e**998, e**999
for k in range(1, 6):
    for b in range(5):
        c = (gf_to_int(a1**k * int_to_gf(2)**b) |
             gf_to_int(a2**k * int_to_gf(2)**b) << 10 |
             gf_to_int(a3**k * int_to_gf(2)**b) << 20)
        SYNDROME_CONSTS[(k-1)*5 + b] = c

for i in range(25):
    print("static_assert(SYNDROME_CONSTS[%i] == %i);" % (i, SYNDROME_CONSTS[i]))

@meshcollider
Copy link
Contributor Author

meshcollider commented Nov 30, 2021

Only change:

- constexpr const std::array<uint32_t, 25>& SYNDROME_CONSTS = GenerateSyndromeConstants();
+ constexpr std::array<uint32_t, 25> SYNDROME_CONSTS = GenerateSyndromeConstants();

as per @sipa's comment.

@sipa
Copy link
Member

sipa commented Nov 30, 2021

ACK 2fa4fd1, verified the tables by inserting assertions generated by a Sage script.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 2fa4fd1 with caveat that I didn't try to understand or verify the math here, I just checked that all comments describing the math matched up with the code, and that all the changes seemed like improvements.

@gruve-p
Copy link
Contributor

gruve-p commented Dec 5, 2021

ACK 353b463

@laanwj
Copy link
Member

laanwj commented Dec 6, 2021

Re-ACK a4fe701

@laanwj laanwj merged commit 22feb7f into bitcoin:master Dec 6, 2021
@meshcollider meshcollider deleted the 202111_bech32_followup branch December 6, 2021 13:03
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a nit


/** Find index of an incorrect character in a Bech32 string. */
std::string LocateErrors(const std::string& str, std::vector<int>& error_locations) {
std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring reads odd in doxygen:

"Find index of an incorrect character in a Bech32 string. Return the positions of errors in a Bech32 string. "

See https://doxygen.bitcoincore.org/namespacebech32.html#ad3d7336317a6e6d359b54b2f60b53acc

Also, std::make_pair is not needed with C++11/17

diff --git a/src/bech32.cpp b/src/bech32.cpp
index 3cda1dfff5..abc617467d 100644
--- a/src/bech32.cpp
+++ b/src/bech32.cpp
@@ -111,7 +111,7 @@ constexpr std::pair<std::array<int16_t, 1023>, std::array<int16_t, 1024>> Genera
         GF1024_LOG[v] = i;
     }
 
-    return std::make_pair(GF1024_EXP, GF1024_LOG);
+    return {GF1024_EXP, GF1024_LOG};
 }
 
 constexpr auto tables = GenerateGFTables();
@@ -395,27 +395,27 @@ DecodeResult Decode(const std::string& str) {
     return {result, std::move(hrp), data(values.begin(), values.end() - 6)};
 }
 
-/** Find index of an incorrect character in a Bech32 string. */
-std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
+std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str)
+{
     std::vector<int> error_locations{};
 
     if (str.size() > 90) {
         error_locations.resize(str.size() - 90);
         std::iota(error_locations.begin(), error_locations.end(), 90);
-        return std::make_pair("Bech32 string too long", std::move(error_locations));
+        return {"Bech32 string too long", std::move(error_locations)};
     }
 
     if (!CheckCharacters(str, error_locations)){
-        return std::make_pair("Invalid character or mixed case", std::move(error_locations));
+        return {"Invalid character or mixed case", std::move(error_locations)};
     }
 
     size_t pos = str.rfind('1');
     if (pos == str.npos) {
-        return std::make_pair("Missing separator", std::vector<int>{});
+        return {"Missing separator", std::vector<int>{}};
     }
     if (pos == 0 || pos + 7 > str.size()) {
         error_locations.push_back(pos);
-        return std::make_pair("Invalid separator position", std::move(error_locations));
+        return {"Invalid separator position", std::move(error_locations)};
     }
 
     std::string hrp;
@@ -430,7 +430,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
         int8_t rev = CHARSET_REV[c];
         if (rev == -1) {
             error_locations.push_back(i);
-            return std::make_pair("Invalid Base 32 character", std::move(error_locations));
+            return {"Invalid Base 32 character", std::move(error_locations)};
         }
         values[i - pos - 1] = rev;
     }
@@ -550,7 +550,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
             }
         } else {
             // No errors
-            return std::make_pair("", std::vector<int>{});
+            return {"", std::vector<int>{}};
         }
 
         if (error_locations.empty() || (!possible_errors.empty() && possible_errors.size() < error_locations.size())) {
@@ -562,7 +562,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
                               : error_encoding == Encoding::BECH32 ? "Invalid Bech32 checksum"
                               : "Invalid checksum";
 
-    return std::make_pair(error_message, std::move(error_locations));
+    return {error_message, std::move(error_locations)};
 }
 
 } // namespace bech32
diff --git a/src/bech32.h b/src/bech32.h
index 5e89e6efda..707299f19e 100644
--- a/src/bech32.h
+++ b/src/bech32.h
@@ -45,7 +45,7 @@ struct DecodeResult
 /** Decode a Bech32 or Bech32m string. */
 DecodeResult Decode(const std::string& str);
 
-/** Return the positions of errors in a Bech32 string. */
+/** Return the positions of errors in a Bech32 string. (Empty string if no errors can be found). */
 std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str);
 
 } // namespace bech32

@maflcko
Copy link
Member

maflcko commented Dec 7, 2021

More nits:

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 7, 2021
a4fe701 Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson)
2fa4fd1 Use std::iota instead of manually pushing range (Samuel Dobson)
405c96f Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson)
28d9c28 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson)
14358a0 Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson)
63f7b69 Update release note for bech32 error detection (Samuel Dobson)
c8b9a22 Report encoding type in bech32 error message (Samuel Dobson)
92f0caf Improve Bech32 boost tests (Samuel Dobson)
bb4d3e9 Address review comments for Bech32 error validation (Samuel Dobson)

Pull request description:

  A number of follow-ups and improvements to the bech32 error location code, introduced in bitcoin#16807.

  Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

ACKs for top commit:
  laanwj:
    Re-ACK a4fe701

Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276
RandyMcMillan pushed a commit to RandyMcMillan/mempool-tab that referenced this pull request Dec 23, 2021
f5e22de Make Bech32 LocateErrors return error list rather than using out-arg (Samuel Dobson)
a94d838 Use std::iota instead of manually pushing range (Samuel Dobson)
e32e83d Use bounds-checked array lookups in Bech32 error detection code (Samuel Dobson)
af1c914 Simplify encoding of e in GF(1024) tables to (1,0) (Samuel Dobson)
a85037e Replace GF1024 tables and syndrome constants with compile-time generated constexprs. (Samuel Dobson)
9ac54d3 Update release note for bech32 error detection (Samuel Dobson)
5fb7801 Report encoding type in bech32 error message (Samuel Dobson)
1fc2945 Improve Bech32 boost tests (Samuel Dobson)
e242dd3 Address review comments for Bech32 error validation (Samuel Dobson)

Pull request description:

  A number of follow-ups and improvements to the bech32 error location code, introduced in #16807.

  Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

ACKs for top commit:
  laanwj:
    Re-ACK f5e22de

Tree-SHA512: 6312373c20ebd6636f5797304876fa0d70fa777de2f6c507245f51a652b3d1224ebc55b236c9e11e6956c1e88e65faadab51d53587078efccb451455aa2e2276
@bitcoin bitcoin locked and limited conversation to collaborators Dec 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants