-
Notifications
You must be signed in to change notification settings - Fork 38.7k
fix an undefined behavior in uint::SetHex #14734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Decrementing psz beyond the beginning of the string is UB, even though the out-of-bounds pointer is never dereferenced.
|
|
|
Extremely nice find and another excellent contribution from you! Thanks! You're right that the sanitizers don't catch this class of UB. I think From what I can see we unconditionally hit this UB as part of startup no matter what parameters we pass to Why this is UB: Assuming |
|
@promag What is "old code" referring to? Don't you agree that the existing code in |
|
@practicalswift TIL, it is UB unless one past the last element. utACK 0f459d8. |
|
@kazcw how about this: diff --git a/src/uint256.cpp b/src/uint256.cpp
index d9da66803..e5cadfae5 100644
--- a/src/uint256.cpp
+++ b/src/uint256.cpp
@@ -40,13 +40,12 @@ void base_blob<BITS>::SetHex(const char* psz)
const char* pbegin = psz;
while (::HexDigit(*psz) != -1)
psz++;
- psz--;
unsigned char* p1 = (unsigned char*)data;
unsigned char* pend = p1 + WIDTH;
- while (psz >= pbegin && p1 < pend) {
- *p1 = ::HexDigit(*psz--);
- if (psz >= pbegin) {
- *p1 |= ((unsigned char)::HexDigit(*psz--) << 4);
+ while (psz > pbegin && p1 < pend) {
+ *p1 = ::HexDigit(*--psz);
+ if (psz > pbegin) {
+ *p1 |= ((unsigned char)::HexDigit(*--psz) << 4);
p1++;
}
} |
|
@practicalswift Just reading code! @promag That would be the minimal change to fix the problem, but Satoshi wrote C++ like an old-school C hacker 😆. IMO indexing is a little clearer in a codebase that these days uses more modern idioms. |
|
@kazcw Nice! Then keep on reading code please! :-) I've tried to launch a crusade against UB in Bitcoin Core. I even named the latest C-lightning release "The Consensus Loving Nasal Daemon" as a catchy slogan in this fight, so and I'm very happy to see your contributions! Keep 'em coming! :-) Context:
|
|
@MarcoFalke @laanwj What about adding a new label – say "Undefined Behaviour" – for UB pull requests and issues? FWIW I think the current use of the "Refactoring" label for UB PRs is problematic for a number of reasons:
Some open UB PR:s that need the suggested "Undefined Behaviour" label:
|
|
@practicalswift While I am in favor of (slowly) moving the code to strictly comply with the C++ standard, I also think we should also be aware that there are differences between:
I would argue that the first one or two are not a priority at all (still a good-to-have, but there are also code churn concerns), while the last few are increasingly more so. I don't think classifying them all under "UB" is a good idea - it would be overly dramatic for some things, and a wild understatement for others. By this I don't want to discourage anyone from working towards improving the situation, but perhaps a "crusade" isn't the right approach. |
|
Mostly agree with @sipa. Also, there is a label "Bug", which (I guess) could be used when there is an observable issue with the code. I.e. actual malfunction and not just a code style issue or random compiler warning. Generally, if you are unable to write a functional or unit test case that will fail on current master, I'd say applying the "Refactoring" label is just fine. We only support gcc and clang as compilers and in most cases of UB they just do the right thing without having to rewrite the code. And given that this piece of code is around for as long as the code base existed and no one ever run into issues, it seems unlikely that things will suddenly break. |
|
This is a complex but important topic. I think it would be worth working out a clearly-defined policy. I propose for comment an approach that admits mostly-objective triage into practical categories, based on what kinds of testing can demonstrate the issue. I have a starting point here (suggested changes / fundamental disagreements welcome): If there's agreement on this approach in principle, I could PR it as section in docs/developer-notes.md, and then we could use the PR process to work out the finer details. @sipa, I think this is generally consistent with the approach you have outlined. To keep the UB discussion simple I have not addressed unreachable codepaths, which I think would fit the same needs-refactor category as other footgun-API issues. @MarcoFalke I have included a "Brittle" category in between "Refactoring" and "Broken", that would fit for example a function call that does not currently cause misbehavior, but could become broken if the compiler decided to inline it. If you agree that this kind of thing is more severe than might be expected for issues under the "Refactoring" label, we should try to come to a consensus on how to define it. @practicalswift I think a crusade on "potentially-dangerous UB" would be apt, as long as we are careful about how we define "potentially-dangerous". What do you think of the approach of leniency for a whitelist of classes of UB in existing code, with those classes selected based on clang's sanitizers options (and in this particular issue, lack thereof)? (And maybe valgrind?) It looks like the current sanitizers used in CI test runs are "thread" and "integer,undefined". That is definitely insufficient for the definition of the "Brittle" classification: for example, those sanitizers don't include uninitialized reads. IMO a whitelist approach is appropriate: every sanitizer category should be considered to represent potential danger unless specifically decided otherwise (which we might just do lazily, as issues come up?). Ideally all sanitizers deemed important would be checked in CI test runs, but given the slowdown factor of some sanitizers, I don't know if that would be practical. |
Unfortunately we don't run the thread sanitizer (#14058), nor the memory sanitizer or valgrind (due to memory issues internal to bdb and qt, IIRC). |
|
utACK 0f459d8 This pull request eliminates pointer arithmetic [1] and addresses UB if [1] “Pointer arithmetic is fragile and easy to get wrong, the source of many, many bad bugs and security violations.“ ES.42: Keep use of pointers simple and straightforward [2] Excerpt From: Bjarne Stroustrup. “The C++ Programming Language, 4/e“:
|
|
@practicalswift It sounds like you are only looking at things from one side. Consider a case where a behaviour is language spec UB but explicitly defined by our supported toolchain and ubiquitously used. In that case the existing behaviour is at least currently harmless. A fix risks introducing its own bugs (potentially not harmless) and also diverts review/testing attention away from other changes, potentially allowing bugs through or causing important improvements to be delayed. If the fix is done hastily (e.g. as part of a mass of changes in a 'crusade') even if it is not buggy itself could make the code more brittle or less clear and thus contribute to the introduction of bugs down the line. (You've seen me complain sometimes against fixing warnings by peppering in casts, and usually its because of brittleness/less clarity) There are as many ways for code to be good as there are people and kinds of code, a decision to do something is never a one sided choice to make things better or not, it's always a prioritization of what kinds of better we're making things and what risks we're taking by making those changes. |
|
@gmaxwell I agree with the point you and @sipa make: all changes – including UB fixes – should be critically evaluated from a risk-reward perspective. And PR:s deemed too risky should be closed off after review. Perhaps what we don't agree on is how much work it would take to kill off the remaining instances of UB in our code base, and how complex the required changes would be. Could that be the case? If we get specific: is there any particular type of UB you fear that we have in such quantity that it would be unrealistic to fix? Also, are there any UB fixes that have been submitted so far that you don't think are worth fixing? Personally I think we're better off having all UB cases identified and documented. That way we can reason about them like we do in this PR and take the appropriate action (if any). We agree on that, right? :-) |
|
To the best of my knowledge the runtime observable UBs were already documented through an UBSAN suppressions file (https://github.com/bitcoin/bitcoin/blob/5c292dafcd54adfcd9f80c0e1fccb45c8683808f/contrib/sanitizers-ubsan.suppressions)? To fix all of them a series of separate pull requests would be required, each of them a massive review burden. It would probably take months if not years to get all through the review process. Especially |
|
@MarcoFalke Some things that are important to note when talking about UB:
If we get more specific:
|
|
@practicalswift I think the only disagreement is on how much of a priority this is. |
PR #14794 adds AddressSanitizer (ASan) to Travis. Please review :-) |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
|
This is sure generating a lot of discussion for a few-line change. Is there disagreement on merging this specific change? |
|
@laanwj There is no disagreement about the code change from what I can tell. I think we should merge this or alternatively document in |
|
Please do not just accept any actually known true UB as just an "assumption". Assumptions should be for platform limitations (e.g. we require int be 32 bits) and implementation defined behaviour (including places where clang and GCC explicitly strengthen the language specification and turn something that the language leaves UB into implementation defined). I think it would be really nice if every PR fixing an obvious bug didn't turn into some awful debate about undefined behaviour. |
|
Agreed: so let's move back the focus in this PR to reviewing the specific code change. Please move general UB discussion elsewhere: This has currently 2 utACKs, no NACKs. |
|
This might be of help for people reviewing this PR who want to understand the old behaviour: Consider the case When the reverse processing reaches the leading Please note that An example: |
0f459d8 fix an undefined behavior in uint::SetHex (Kaz Wesley) Pull request description: Decrementing psz beyond the beginning of the string is UB, even though the out-of-bounds pointer is never dereferenced. I don't think any clang sanitizer covers this, so I don't see any way a test could catch the original behavior. ACKs for top commit: promag: utACK 0f459d8. l2a5b1: utACK 0f459d8 Tree-SHA512: 388223254ea6e955f643d2ebdf74d15a3d494e9f0597d9f05987ebb708d7a1cc06ce64bd25d447d75b5f5561bdae9630dcf25adb7bd75f7a382298b95d127162
0f459d8 fix an undefined behavior in uint::SetHex (Kaz Wesley) Pull request description: Decrementing psz beyond the beginning of the string is UB, even though the out-of-bounds pointer is never dereferenced. I don't think any clang sanitizer covers this, so I don't see any way a test could catch the original behavior. ACKs for top commit: promag: utACK 0f459d8. l2a5b1: utACK 0f459d8 Tree-SHA512: 388223254ea6e955f643d2ebdf74d15a3d494e9f0597d9f05987ebb708d7a1cc06ce64bd25d447d75b5f5561bdae9630dcf25adb7bd75f7a382298b95d127162
Summary:
0f459d868d85053f1cc066ea9099793f88cbd655 fix an undefined behavior in uint::SetHex (Kaz Wesley)
Pull request description:
Decrementing psz beyond the beginning of the string is UB, even though
the out-of-bounds pointer is never dereferenced.
I don't think any clang sanitizer covers this, so I don't see any way a test could catch the original behavior.
ACKs for top commit:
promag:
utACK 0f459d8.
l2a5b1:
utACK 0f459d868d85053f1cc066ea9099793f88cbd655
Tree-SHA512: 388223254ea6e955f643d2ebdf74d15a3d494e9f0597d9f05987ebb708d7a1cc06ce64bd25d447d75b5f5561bdae9630dcf25adb7bd75f7a382298b95d127162
Backport of Core [[bitcoin/bitcoin#14734 | PR14734]]
Test Plan: `ninja check`
Reviewers: #bitcoin_abc, majcosta
Reviewed By: #bitcoin_abc, majcosta
Differential Revision: https://reviews.bitcoinabc.org/D6844
0f459d8 fix an undefined behavior in uint::SetHex (Kaz Wesley) Pull request description: Decrementing psz beyond the beginning of the string is UB, even though the out-of-bounds pointer is never dereferenced. I don't think any clang sanitizer covers this, so I don't see any way a test could catch the original behavior. ACKs for top commit: promag: utACK 0f459d8. l2a5b1: utACK 0f459d8 Tree-SHA512: 388223254ea6e955f643d2ebdf74d15a3d494e9f0597d9f05987ebb708d7a1cc06ce64bd25d447d75b5f5561bdae9630dcf25adb7bd75f7a382298b95d127162 # Conflicts: # src/uint256.cpp
Decrementing psz beyond the beginning of the string is UB, even though
the out-of-bounds pointer is never dereferenced.
I don't think any clang sanitizer covers this, so I don't see any way a test could catch the original behavior.