-
Notifications
You must be signed in to change notification settings - Fork 38.6k
build: Embedded ASMap [3/3]: Build binary dump header file #28792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28792. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2025-11-25 |
b8b09a7 to
f9288bc
Compare
|
Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the |
|
How much data would this add per year to the repo, assuming updates happen? |
|
@maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%). |
|
Why is the option of loading it from a file even in dev builds, not considered? |
Yeah, that would work as well. I have drafted that approach here: fjahr@0f9109f Details can change of course (like keeping the raw file in init) but if people prefer that approach, I will pull this into here. |
It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: "The alternative approach is not having the data in the source code but only adding it during the build phase of a release." But this also comes with the aforementioned downsides. |
|
A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the It seems to make most sense to me to take the path suggested by @sipa, including the |
f9288bc to
3bcc1ca
Compare
3bcc1ca to
be1e04a
Compare
be1e04a to
5167707
Compare
|
I am picking this back up since we have come a long way in terms of tooling outside of Bitcoin Core and lately I was just waiting for cmake and v28 feature freeze. I will suggest an early merge in the cycle for v29 so more people are encouraged to use one steady file over a longer period of time. I am now using the suggested approach of building the header file at compile time and I added the functionality for both autotools and cmake (the autotools one has existed for a while, I can also remove it if people think it's unnecessary). Of course this is rebased to make cmake available. First time I am editing the cmake build system so there are probably things to improve... And there is also a command for I am adding the latest (1724248800) asmap file from asmap-data, the corresponding creation process can be reviewed in the issue and PR. I have not added a config option to skip the asmap header file but I will look into that next. |
I think the suggestion was to not include the header itself in that case.
Maybe keep it in until #30664 is merged. |
5167707 to
e5a25fd
Compare
ae25931 to
1dcd2e8
Compare
|
Only rebased again |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK 1dcd2e8
I do understand the distaste at adding 1MB+ binary blobs into a reproducible build. From that perspective, it would be cleaner to include the ASMap blob as a separate file included in our Bitcoin Core releases.
But having a separate file opens up the can of worms of where to install it on various operating systems and distributions, how to encourage package maintainers to include and ensure it ends up in the right locations, and making Bitcoin Core search an ever changing list of directories to check for ASMap file existence (or requiring always specifying -noasmap or -asmap=/path/on/distro/x/asmap.dat). That would hamper broad ASMap adoption, especially if we want to make it default-on in the future.
Might be good to add a blurb about the consequences of not embedding to the PR description.
Compared build times between WITH_EMBEDDED_ASMAP=OFF/ON.
Generation times unaffected
OFF
$ ccache -C && rm -rf build && time cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=OFF
real 0m18.115s
real 0m18.091s
real 0m18.027sON
$ ccache -C && rm -rf build && time cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=ON
real 0m18.050s
real 0m17.949s
real 0m17.995sCompile times fairly unaffected
OFF
$ ccache -C && rm -rf build && cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=OFF && \
time cmake --build build -t bitcoind
real 2m28.611s
real 2m27.686s
real 2m26.425sON
$ ccache -C && rm -rf build && cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=ON && \
time cmake --build build -t bitcoind
real 2m29.730s
real 2m29.345s
real 2m30.751s
real 2m28.495s
real 2m26.334s| @@ -13,4 +13,4 @@ export PACKAGES="python3-zmq python3-pip clang-17 llvm-17 libc++abi-17-dev libc+ | |||
| export PIP_PACKAGES="--break-system-packages pycapnp" | |||
| export DEP_OPTS="NO_WALLET=1 CC=clang-17 CXX='clang++-17 -stdlib=libc++'" | |||
| export GOAL="install" | |||
| export BITCOIN_CONFIG="-DREDUCE_EXPORTS=ON -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_KERNEL_LIB=ON -DBUILD_KERNEL_TEST=ON -DBUILD_SHARED_LIBS=ON" | |||
| export BITCOIN_CONFIG="-DREDUCE_EXPORTS=ON -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_KERNEL_LIB=ON -DBUILD_KERNEL_TEST=ON -DBUILD_SHARED_LIBS=ON -DWITH_EMBEDDED_ASMAP=NO" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description nit:
- -DWITH_EMBEDDED_ASMAP=NO
+ -DWITH_EMBEDDED_ASMAP=OFFCo-authored-by: Hodlinator <[email protected]>
Calculate the asmap version only in one place: A dedicated function in util/asmap. The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
This prevents holding the asmap data in memory twice. The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
Also makes minor improvement on the python implementation documentation.
This aligns it more with SanityCheckAsmap and reduces variable scope. Also unify asmap casing in SanityCheckAsmap function name. Co-authored-by: Hodlinator <[email protected]>
The data embedded is from the latest ASMap file from the asmap-data repository: https://github.com/asmap/asmap-data/blob/main/1755187200_asmap.dat
This can be disabled with -DWITH_EMBEDDED_ASMAP=OFF.
1dcd2e8 to
4821e4b
Compare
|
Addressed comments and rebased after #33770 was merged including the latest changes in #33878. This PR now necessarily re-adds the possibility to set a bool option for Re @hodlinator #28792 (comment): I don't think I understand your comment. I am pretty sure I have addressed your previous comment to use |
4821e4b to
89b87c5
Compare
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Update to the main thread in case others are waiting for me to respond to the above - I responded in the thread: #28792 (comment) and the issue was resolved. Other reviewers: please have a look at the PR that should be merged ahead of this one: #33878 |
Depends on #33878. This is the third in a tripled of PRs that implement the necessary changes for embedding of asmap data into the binary. This last part add the initial asmap data, implements the build changes and adds further documentation.
Currently an asmap file needs to be acquired by there user from some location or the user needs to generate one themselves. Then they need to move the file to the right place in datadir or pass the path to the file as
-asmap=PATHin order to use the asmap feature. The change here allows for builds to embed asmap data into the bitcoind binary which makes it possible to use the feature without handling of the asmap file by the user. If the user starts bitcoind with-asmapthe embedded data will be used for bucketing of nodes.The data lives in the repository at
src/node/data/ip_asn.datand can be replaced with a new version at any time. The idea is that the data should be updated with every release. By default the data at that location is embedded into the binary but there is also a build option to prevent this (-DWITH_EMBEDDED_ASMAP=OFF). In this case the original behavior of the-asmapoption is maintained.