Skip to content

Conversation

@ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Jun 28, 2024

Remove documentation that says LogError should be used for "severe problems that require the node (or a subsystem) to shut down entirely" because:

  • It is not how LogError and Level::Error are used currently. Of 129 current uses only 58 cases are fatal according to #30348

  • "[T]here's not much benefit in a log that says "hey this error is about to cause the node to stop" -- you already get that information by seeing "Shutdown: in progress..." immediately following." according to #30347 (comment)

Fixes #30348

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 28, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30361.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept NACK maflcko
User requested bot ignore l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@ajtowns
Copy link
Contributor

ajtowns commented Jun 28, 2024

I think both entries should continue to include examples.

I think removing "severe" from the LogWarning description doesn't make sense -- anything the admin might have to address is a severe (potential) problem, isn't it?

Having LogError mean "the node admin will need to address" vs LogWarning mean "the node admin may need to address" seems to argue towards retaining the "LogError is fatal" interpretation to me -- fatal errors are ones the admin does need address; anything else they can ignore until it becomes fatal.

A potential advantage of having LogError (or LogFatal) reserved for fatal errors is that you could grep through the codebase for LogError to see all the potentially fatal errors, and perhaps work on finding ways to make them less fatal. Not convinced that's very beneficial though.

I don't think there's much other benefit to splitting potential problems up from actual problems otherwise if the admin still needs to attend to them either way, though; so if we're already making LogError just mean actual errors; is there any reason not to merge them both into a single LogAlert() or similar?

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Jun 30, 2024

re: #30361 (comment)

Thanks, added back examples and "severe".

I think probably just looking at AbortNode callers would be as effective as as searching for fatal error messages, if looking for places to make error handling more robust.

I do like the alert level idea and posted an implementation in #30364.

Updated 5175cbc -> b7aae36 (pr/nofat.1 -> pr/nofat.2, compare) with clarifications and examples.

Rebased b7aae36 -> 5e2eb5b (pr/nofat.2 -> pr/nofat.3, compare)

@maflcko
Copy link
Member

maflcko commented Jul 16, 2024

  • It is not how LogError and Level::Error are used currently. Of 129 current uses only 58 cases are fatal according to #30348

The reason is that many stem from error(), see also #30364 (review).

  • "[T]here's not much benefit in a log that says "hey this error is about to cause the node to stop" -- you already get that information by seeing "Shutdown: in progress..." immediately following." according to #30347 (comment)

Agree, but then just go with #30364? Seems odd to change something twice, when it can be changed once?

weak NACK for now.

@ryanofsky
Copy link
Contributor Author

re: #30361 (comment)

I think I might not have enough information to understand the weak nack, because I don't actually see a reason in there to reject a documentation fix that is only dropping advice to interpret Error as "fatal" and Warning as "nonfatal".

The advice is not followed by current code, but more importantly, I don't think anybody (even the original author) currently thinks this advice is something we should try to implement anymore. If something in this PR makes the documentation worse, it'd be helpful to know where the problems are. Otherwise this seems like a clean fix for the issue described in #30348, and a way to close that issue and move on to other fixes and improvements.

I do understand what you are saying about the error() removal in #29236 making the issue described in #30348 worse because it increased the amount of Error spam. But it's not like this PR is changing documentation to say that Error spam is ok. It is clearly not. It is just trying to remove a separate point of confusion.

@maflcko
Copy link
Member

maflcko commented Aug 27, 2024

I think I might not have enough information to understand the weak nack

It is the line prior: "Seems odd to change something twice, when it can be changed once? weak NACK for now."

Now that the other pull is closed, I guess this is fine.

@maflcko
Copy link
Member

maflcko commented Aug 27, 2024

lgtm ack b7aae36

Seems fine to change the documentation here to be a description of what the code does right now. Also, it seems fine for this description to be applied going forward, because both an "Error" or "Warning" should be considered an Alert for the node admin to investigate (and possibly dismiss) (with or without the node shutting down).

There are probably cases where an Error can be "downgraded" to a Warning in the code, but they can be done in follow-up changes.

@ajtowns
Copy link
Contributor

ajtowns commented Aug 27, 2024

I don't think anybody (even the original author) currently thinks this advice is something we should try to implement anymore.

Presuming that if it's me who's he-who-must-not-be-named here rather than referring back to #24464 etc, what I think is that if we're going to have distinct log levels for "error" and "warning", I think there should be a clear documented difference between what they are, so that PR authors, reviewers and users can know what to expect. If there's places in our code that don't meet that expectation, then that's something that should be fixed, though it doesn't need to be treated as some huge priority. On the other hand, if there's no distinction between the levels that's worth making, we shouldn't have two levels. So I think either the status quo or #30364 is better than this PR. It's not really worth arguing about, but it's frustrating having people put words in my mouth that aren't mine.

@maflcko
Copy link
Member

maflcko commented Sep 18, 2024

So I think either the status quo or #30364 is better than this PR.

I tend to agree on a second thought. This pull just seems to increase the long-term confusion by treating the levels equivalent or similar, just because some parts of the code happen to confuse them (for various reasons).

Also, as pointed out in #30364 (comment) (and other comments) a good chunk of the log statements that this doc-only pull is trying to "fix" shouldn't be log statements at all.

So overall I am NACK -ish on increasing confusion in the docs, motivated by code that will be removed or changed long-term anyway.

@l0rinc
Copy link
Contributor

l0rinc commented Sep 18, 2024

While I understand why @maflcko doesn't want to change twice, this seems to me like a slight improvement.

utACK b7aae36

Comment on lines 746 to 747
- `LogError(fmt, params...)` should be used in place of `LogInfo` for severe
errors the node admin will need to address (e.g., failure to write data).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of seems slightly more common than in place of + that to streamline readability + needs to address to imply urgency:

Suggested change
- `LogError(fmt, params...)` should be used in place of `LogInfo` for severe
errors the node admin will need to address (e.g., failure to write data).
- `LogError(fmt, params...)` should be used instead of `LogInfo` for severe errors
that the node admin needs to address (e.g., failure to write data).

should address (e.g. system time appears to be wrong, unknown soft fork
appears to have activated).

- `LogTrace(BCLog::CATEGORY, fmt, params...)` should be used in place of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the logger order is weird, why debug -> info -> error -> warning -> trace?
Especially after announcing LogInfo, LogDebug, LogTrace, LogWarning and LogError in the intro.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 1, 2025

Needs rebase, if still relevant

@DrahtBot
Copy link
Contributor

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

Remove documentation that says LogError should be used for "severe problems
that require the node (or a subsystem) to shut down entirely" because:

- This is not how `LogError` and `Level::Error` are used currently. Of 129
  current uses only 58 cases are fatal according to
  bitcoin#30348

- "[T]here's not much benefit in a log that says "hey this error is about to
  cause the node to stop" -- you already get that information by seeing
  "Shutdown: in progress..." immediately following." according to
  bitcoin#30347 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Misused LogError and LogWarning macros

5 participants