Skip to content

Conversation

@sedited
Copy link
Contributor

@sedited sedited commented Dec 12, 2024

Carrying non-kernel related fields in the cache sizes for the indexes is confusing for kernel library users. The cache sizes are set currently with magic numbers in bitcoin-chainstate. The comments for the cache size calculations are not completely clear. The constants for the cache sizes are also currently in txdb.h, which is not an ideal place for holding all cache size related constants.

Solve these things by moving the kernel-specific cache size fields to their own struct and moving the constants to either the node or the kernel cache sizes.

This slightly changes the way the cache is allocated if (and only if) the txindex and/or blockfilterindex is used. Since they are now given precedence over the block tree db cache, this results in a bit less cache being allocated to the block tree db, coinsdb and coins caches. The effect is negligible though, i.e. cache sizes with default dbcache reported through the logs are:

master:

Cache configuration:
* Using 2.0 MiB for block index database
* Using 56.0 MiB for transaction index database
* Using 49.0 MiB for basic block filter index database
* Using 8.0 MiB for chain state database
* Using 335.0 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)

this PR:

Cache configuration:
* Using 2.0 MiB for block index database
* Using 56.2 MiB for transaction index database
* Using 49.2 MiB for basic block filter index database
* Using 8.0 MiB for chain state database
* Using 334.5 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)

This PR is part of the libbitcoinkernel project.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 12, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31483.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK stickies-v, ryanofsky, hodlinator

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31650 (refactor: Enforce readability-avoid-const-params-in-decls by maflcko)
  • #31645 (optimization: increase default LevelDB write batch size to 64 MiB by l0rinc)
  • #30965 (kernel: Move block tree db open to block manager by TheCharlatan)
  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
  • #24539 (Add a "tx output spender" index by sstone)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Makes sense to separate this.

@sedited
Copy link
Contributor Author

sedited commented Dec 14, 2024

Thanks for the review @stickies-v,

Updated 173477b -> f454987 (kernel_cache_sizes_0 -> kernel_cache_sizes_1, compare)

  • Addressed @stickies-v's comment, removed unnecassary new index size member in the tests.
  • Addressed @stickies-v's comment, fixed commit order for include addition and optionsmodel and removed the no longer needed txdb include.

@sedited
Copy link
Contributor Author

sedited commented Dec 16, 2024

Thanks for the comments @stickies-v,

Updated f454987 -> 2c49e0b (kernel_cache_sizes_1 -> kernel_cache_sizes_2, compare)

  • Addressed @stickies-v's comment, added a kernel-specific cache size constant and moved the minimum size out of the kernel.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 2c49e0b

The only behaviour change is a very slight difference in cache allocation by moving block_tree_db allocation after index allocation. Because block_tree_db is capped at 2MiB, the maximum impact this change can have is very limited. Even at the most extreme case with -dbcache=4 -txindex -blockfilterindex=1, the results are not meaningfully different:

On master:

2024-12-17T14:25:35Z Cache configuration:
2024-12-17T14:25:35Z * Using 0.5 MiB for block index database
2024-12-17T14:25:35Z * Using 0.4 MiB for transaction index database
2024-12-17T14:25:35Z * Using 0.4 MiB for basic block filter index database
2024-12-17T14:25:35Z * Using 1.3 MiB for chain state database
2024-12-17T14:25:35Z * Using 1.3 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)

With this PR:

2024-12-17T14:27:38Z Cache configuration:
2024-12-17T14:27:38Z * Using 0.4 MiB for block index database
2024-12-17T14:27:38Z * Using 0.5 MiB for transaction index database
2024-12-17T14:27:38Z * Using 0.4 MiB for basic block filter index database
2024-12-17T14:27:38Z * Using 1.3 MiB for chain state database
2024-12-17T14:27:38Z * Using 1.3 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)

@sedited
Copy link
Contributor Author

sedited commented Dec 17, 2024

Updated 2c49e0b -> 8e929e3 (kernel_cache_sizes_2 -> kernel_cache_sizes_3, compare)

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 8e929e3 - just a small doc change.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 8e929e3. All these changes make sense and definitely make the kernel API nicer.

I do also think it would be an improvement to change all the int64_t fields to size_t since they just get casted to size_t in the end anyway, and because size_t is more self-documenting and makes it clearer the sizes are in bytes.

@sedited
Copy link
Contributor Author

sedited commented Dec 17, 2024

Thanks for the review @ryanofsky!
Updated 8e929e3 -> 7517126 (kernel_cache_sizes_3 -> kernel_cache_sizes_4, compare)

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 7517126

I think c71fc6a is an absolute improvement, but I'm ~0 on 7517126. (edit: resolved)

@DrahtBot DrahtBot requested a review from ryanofsky December 18, 2024 12:24
These have nothing to do with the txdb, so move them out and into the
node caches.
They are not related to the txdb, so a better place for them is the
new kernel and node cache file. Re-use the default amount of kernel
cache for the default node cache.
This avoids having to rely on implicit casts when passing them to the
various functions allocating the caches.

This also ensures that if the requested amount of db_cache does not fit
in a size_t, it is clamped to the maximum value of a size_t.

Also take this opportunity to make the total amounts of cache in the
chainstate manager a size_t too.
@sedited sedited force-pushed the kernel_cache_sizes branch from 430d5fe to 2a92702 Compare January 15, 2025 15:32
@sedited
Copy link
Contributor Author

sedited commented Jan 15, 2025

Updated 430d5fe -> 2a92702 (kernel_cache_sizes_16 -> kernel_cache_sizes_17, compare)

Curious what motivated change from digits() back to sizeof(). Both ways seem fine. I guess sizeof() matches c++ documentation more closely

Yes, that was my motivation for the change back to sizeof().

@stickies-v
Copy link
Contributor

re-ACK 2a92702

No changes except for outstanding review comments to simplify CheckedLeftShift()'s implementation wrt range checks, making the _MiB test platform-agnostic, making IndexCacheSizes members size_t and doc/typo improvements.

@DrahtBot DrahtBot requested a review from ryanofsky January 15, 2025 15:43
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 2a92702. Changes since last review are fixing size options to use size_t instead of int64_t again, simplifying CheckedLeftShift more, and making other minor suggested cleanups

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 2a92702

  • Excellent with added fuzz test.
  • Added support for shifting negative numbers.
  • Adopting size_t earlier for some cases, reducing churn.

Succeeded building on 32-bit (CentOS).


Nit: commit message 65cde36

- kernel: Move default cache constants to caches
+ kernel: Move default cache constants to caches.h

constexpr std::optional<T> CheckedLeftShift(T input, unsigned shift) noexcept
{
if (shift == 0 || input == 0) return input;
// Avoid undefined c++ behaviour if shift is >= number of bits in T.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mega-nit in case you re-touch:

Suggested change
// Avoid undefined c++ behaviour if shift is >= number of bits in T.
// Avoid undefined C++ behaviour if shift is >= number of bits in T.


auto widen = [](T value) -> W { return value; };
auto clamp = [](W value) -> W { return std::clamp<W>(value, min, max); };
auto check = [](W value) -> std::optional<W> { if (value >= min && value <= max) return value; else return std::nullopt; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Naked else without braces is contrary to developer-notes.md:

- If an `if` only has a single-statement `then`-clause, it can appear
on the same line as the `if`, without braces. In every other case,
braces are required, and the `then` and `else` clauses must appear
correctly indented on a new line.

Consider using ternary operator:

Suggested change
auto check = [](W value) -> std::optional<W> { if (value >= min && value <= max) return value; else return std::nullopt; };
auto check = [](W value) -> std::optional<W> { return (value >= min && value <= max) ? std::optional{value} : std::nullopt; };

if (n_indexes > 0) {
int64_t max_cache = std::min(nTotalCache / 8, max_filter_index_cache << 20);
int64_t max_cache = std::min(nTotalCache / 8, MAX_FILTER_INDEX_CACHE << 20);
sizes.filter_index = max_cache / n_indexes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in commit 8826cae:
(We're assigning what seems to be an int64_t to size_t filter_index here. I tried adding curlies on the right side of the assignment and compiling for 32-bit but was unable to trigger narrowing errors. Maybe the compiler is clever enough to see that (MAX_FILTER_INDEX_CACHE << 20) < std::numeric_limits<size_t>::max()).

@fanquake fanquake merged commit df8bf65 into bitcoin:master Jan 16, 2025
18 checks passed
#include <stdexcept>

//! Overflow-safe conversion of MiB to bytes.
constexpr size_t operator"" _MiB(unsigned long long mebibytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/util/byte_units.h:13:29: error: identifier '_MiB' preceded by whitespace in a literal operator declaration is deprecated [-Werror,-Wdeprecated-literal-operator]
   13 | constexpr size_t operator"" _MiB(unsigned long long mebibytes)
      |                  ~~~~~~~~~~~^~~~
      |                  operator""_MiB
1 error generated.

clang 20.0.0

Fixed in #31691

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done or Closed or Rethinking

Development

Successfully merging this pull request may close these issues.

8 participants