Skip to content

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Sep 16, 2024

A test that is added as part of #30455 uncovered this issue: The GuessVerificationProgress function is used during during descriptor import and relies on m_chain_tx_count. In #29370 an Assume was added expecting the m_chaint_tx_count to be set. However, as the test uncovered, GuessVerificationProgress is called with background sync blocks that have m_chaint_tx_count = 0 when they have not been downloaded and processed yet.

The simple fix is to remove the Assume. Users should not be thrown off by the Internal bug detected error. The behavior of importdescriptor is kept consistent with the behavior for blocks missing due to pruning.

The test by alfonsoromanz is cherry-picked here to show that the CI errors should be fixed by this change.

This PR also improves error messages returned by the importdescriptors and rescanblockchain RPCs. The error message now changes depending on the situation of the node, i.e. if pruning is happening or an assumutxo backgroundsync is active.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 16, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30909.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK mzumsande, furszy, achow101
Stale ACK alfonsoromanz

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30455 (test: assumeutxo: add missing tests in wallet_assumeutxo.py by alfonsoromanz)
  • #30221 (wallet: Ensure best block matches wallet scan state by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@alfonsoromanz alfonsoromanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 9be080abfa87543652c415af4e239a6ba4d2b2e1

The test passes locally, and I see that all the CI checks have passed, so it looks good to me.

It took me some time today to get familiar with the CMake-based build system, but I managed to get it working.

Screenshot 2024-09-16 at 16 05 09

Thanks!

Copy link
Member

@jonatack jonatack Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, is debug logging this enough, unsure but e.g. in the various RPCs (and CLI -getinfo) that return a "verificationprogress" field, are there any where it would be a good idea to describe the what/why of a returned 0.0 value due to this case.

Copy link
Contributor

@mzumsande mzumsande Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should only be possible to trigger with the rescan, the various RPCs are always called with a block (usually the tip) that is guaranteed to have m_chain_tx_count set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And aside from only hitting this in the context of rescan + blocks missing this is just about showing the progress and not the actual functionality the user seeks. So it didn't feel important enough for me to use another category mainly because of that. Unless there are strong opinions against it I will keep this as is.

@maflcko
Copy link
Member

maflcko commented Sep 17, 2024

However, as the test uncovered, GuessVerificationProgress is called with snapshot blocks that have m_chaint_tx_count = 0 when the assumeutxo background sync is in progress.

Can you explain this a bit better? IIRC the snapshot base block itself has the count set, as well as any validated blocks after it. See src/node/blockstorage.cpp: base->m_chain_tx_count = au_data.m_chain_tx_count; . So I think calling GuessVerificationProgress in this case shouldn't lead to issues. Whereas calling a rescan on the (missing) background blocks seems like a bug that should be avoided at the call site, no?

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas calling a rescan on the (missing) background blocks seems like a bug that should be avoided at the call site, no?

Well, the approach seems to be similar with pruning:
importdescriptors tries to scan as many blocks as possible (no stopping after a failure due to missing block data), and then returns a failure / warns the user that the result may be incomplete. It could probably have been implentend differently, checking if we are missing blocks due to pruning and refuse to import the descriptors, but it doesn't do that.

If that is seen as a viable way for the pruning case, maybe it would be a bit inconsistent to just forbid it in the assumeutxo background sync case?! In which case just dealing with the progress estimation error would be ok?

@maflcko
Copy link
Member

maflcko commented Sep 19, 2024

Well, the approach seems to be similar with pruning:

Sure, if that is the case, the pull request description should mention it. Also, the error message could probably be updated, because it only mentions pruning, no AU. (Note to self: Reminds me of #26282)

@mzumsande
Copy link
Contributor

mzumsande commented Sep 19, 2024

Though one could also make a case for not allowing importdescriptor/rescan during background sync if non-downloaded blocks are affected, with the reasoning "just wait a little bit longer" that doesn't really apply to the pruning case. Would be interesting to know if this has ever been discussed for assumeutxo.

@fjahr
Copy link
Contributor Author

fjahr commented Sep 23, 2024

Can you explain this a bit better?

Yeah, my statement was wrong. I think I managed to confuse myself with some wrongfully placed debug logging. I have fixed the description.

Well, the approach seems to be similar with pruning:

Sure, if that is the case, the pull request description should mention it. Also, the error message could probably be updated, because it only mentions pruning, no AU.

Updated the error message and realized that he same issue exists in rescanblockchain, too. So fixed it there as well.

EDIT: I also add coverage for rescanblockchain in wallet_assumeutxo.

Though one could also make a case for not allowing importdescriptor/rescan during background sync if non-downloaded blocks are affected, with the reasoning "just wait a little bit longer" that doesn't really apply to the pruning case. Would be interesting to know if this has ever been discussed for assumeutxo.

I had a similar thoughts when I looked into this earlier. But I think going that route might either break workflows with importdescriptors or it's a pretty invasive change. The RPC takes in multiple descriptors (requests) with timestamps for each and then rescans blocks starting from around the lowest timestamp. The simplest approach would be to check all the blocks from the lowest timestamp and abort the whole RPC if any of them don't have block data available for any block since the timestamp. Or we could exclude the offending requests and only import the remaining requests, starting rescan from the lowest timestamp with available block data. Both of these approaches seem pretty invasive and may lead to unexpected outcomes for users. I think people may undershoot the timestamp for example. We could then help them by returning the lowest timestamp that works (the snapshot height one) with the error but then we basically recreate the current behavior with extra steps. The ideal solution would be that the wallet rescans the missing blocks as they come in but that's probably a much bigger change, and also kind of tricky to communicate to the user.

Also, while "just wait a little bit longer" might be a good answer, we could also say "just run another rescan later" and that may ultimately be a better experience for the users.

I don't have a strong opinion here but it seems like these ideas may better be scoped out in a separate issue first and then potentially be implemented as a follow-up.

@fjahr fjahr force-pushed the 2024-09-au-guess branch 2 times, most recently from d7bc556 to 04a78f1 Compare September 23, 2024 20:13
@fjahr
Copy link
Contributor Author

fjahr commented Sep 23, 2024

@achow101 do you have thoughts on descriptor import/rescan behavior with missing blocks because of ongoing assumeutxo background sync?

@fjahr fjahr changed the title AssumeUTXO: Don't Assume m_chain_tx_count in GuessVerificationProgress wallet, assumeutxo: Don't Assume m_chain_tx_count in GuessVerificationProgress Sep 23, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is easy to provide an accurate error message. Otherwise it could be a bit confusing for a user that had AU enabled (with 100% progress) as well as pruning, not knowing the exact cause and what they need to do to work around the error. Not sure how easy that is, so up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improving the error message is reasonably easy but I didn't see a way without expanding the chain interface. I have implemented what I think is a step in the right direction but not a full solution to every edge case in order to keep the scope of this PR reasonable. Looking for feedback if people still think this is right approach and if I should expand it further or if adding more is going too far.

I think the assumeutxo + pruning use-case is pretty hard to deal with accurately for these wallet use-cases, especially if the pruning target is small. I am not sure if we can really give users a perfect answer in that case. Users could request a window for rescan that is larger than the prune target so I think pruning takes precedent because it requires a reindex to "fix" while AU just takes some time and resolves itself. Also, in the case of au+pruning a block could be not available because it wasn't synced yet during the first time the user tries to import a descriptor. But then the second time the user tries the same block that wasn't synced yet before could now be pruned already, assuming a small prune target. I mean to say, a perfect solution requires changing how the RPC works and I think that's beyond the scope of this PR. I could think about it a bit more and open an issue on it though if that is desirable.

@fjahr
Copy link
Contributor Author

fjahr commented Sep 30, 2024

Improved the error messages for importdescriptors and rescanblockchain to better match the reality (usage of pruning and assumeutxo).

@fjahr fjahr changed the title wallet, assumeutxo: Don't Assume m_chain_tx_count in GuessVerificationProgress wallet, assumeutxo: Don't Assume m_chain_tx_count, Improve wallet RPC errors Sep 30, 2024
@DrahtBot DrahtBot mentioned this pull request Oct 25, 2024
@fjahr
Copy link
Contributor Author

fjahr commented Nov 5, 2024

Rebased and addressed nit from @maflcko

@fjahr fjahr force-pushed the 2024-09-au-guess branch from a9f1c8a to 9d2d9f7 Compare January 5, 2025 16:39
@fjahr
Copy link
Contributor Author

fjahr commented Jan 5, 2025

Addressed feedback from @mzumsande

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 9d2d9f7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I got confused there!
It just seemed a bit unusual, plus we generally use CHECK_NONFATAL instead of assert in rpc code, which may not be the ideal choice for possible future interface users (though it is currently just used in another rpc call, so it's not an actual problem).
But I'm not an expert on the design of our interfaces, so I don't know if moving it somewhere else is necessary.

@DrahtBot DrahtBot requested a review from alfonsoromanz January 6, 2025 16:22
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 9d2d9f7

LogWarning("Internal bug detected: block %d has unset m_chain_tx_count (%s %s). Please report this issue here: %s\n",
pindex->nHeight, CLIENT_NAME, FormatFullVersion(), CLIENT_BUGREPORT);
if (pindex->m_chain_tx_count == 0) {
LogDebug(BCLog::VALIDATION, "Block %d has unset m_chain_tx_count. Unable to estimate verification progress.\n", pindex->nHeight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: the validation category is a bit odd in the wallet scan context or GUI updates.

@achow101
Copy link
Member

ACK 9d2d9f7

@achow101 achow101 merged commit 85f96b0 into bitcoin:master Jan 31, 2025
18 checks passed