-
Notifications
You must be signed in to change notification settings - Fork 38.7k
rpc: add cpu_load to getpeerinfo #31672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31672. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
drahtbot_id_4_m |
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
|
Concept ACK if this can be a reliable, useful metric and help pave the path to #31033. |
|
Concept ACK |
|
Concept ACK Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows, but on all other systems that we support AFAICT) in both the RPC help and a release note. |
BrandonOdiwuor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
|
Seems like there was some very brief discussion in #31033, and the conclusion "I guess this should start with planting some metrics", but I'm not sure putting changes into Bitcoin Core is the right first step. Before changing our API, it'd be good to atleast show some usage that indicates that the changes here are useful. I assume you've already been running this locally, and collecting the data, so it'd be good to know what it turned up? As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you'd expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I'm wondering if this is the best approach. |
|
@fanquake IMO, monitoring CPU usage is useful on its own, even if we don't start treating peers differently based on that metric (#31033). In other words, this metric is useful at least as much as a bunch of other metrics in the Yes, I have been running this locally - there is like 65x difference between the least and most demanding peers. I am curious to correlate this to the messages being sent/received to/from those peers and to have a histogram of the data, not just least / most demanding (e.g. histogram of |
This comment was marked as abuse.
This comment was marked as abuse.
Right. |
3de9c9c to
8a3ec6f
Compare
|
Ready for review. Implemented for Windows as well. Here is a test program I played with to test this manually. I think it is not worth adding as a unit test or benchmark, but at least it can live here in the comments of this PR: cpu_time_windows.cpp#include <assert.h>
#include <windows.h>
#include <winnt.h>
#include <processthreadsapi.h>
#include <iostream>
#include <thread>
void thread(size_t work)
{
FILETIME before_creation;
FILETIME before_exit;
FILETIME before_kernel;
FILETIME before_user;
bool ret = GetThreadTimes(GetCurrentThread(),
&before_creation,
&before_exit,
&before_kernel,
&before_user);
assert(ret == 1);
if (work > 10'000) {
for (size_t i{0}; i < work; ++i) {
(void)(i * i);
}
} else {
std::this_thread::sleep_for(std::chrono::milliseconds{work});
}
FILETIME after_creation;
FILETIME after_exit;
FILETIME after_kernel;
FILETIME after_user;
ret = GetThreadTimes(GetCurrentThread(),
&after_creation,
&after_exit,
&after_kernel,
&after_user);
assert(ret == 1);
ULARGE_INTEGER before_kernel_;
before_kernel_.LowPart = before_kernel.dwLowDateTime;
before_kernel_.HighPart = before_kernel.dwHighDateTime;
ULARGE_INTEGER before_user_;
before_user_.LowPart = before_user.dwLowDateTime;
before_user_.HighPart = before_user.dwHighDateTime;
ULARGE_INTEGER after_kernel_;
after_kernel_.LowPart = after_kernel.dwLowDateTime;
after_kernel_.HighPart = after_kernel.dwHighDateTime;
ULARGE_INTEGER after_user_;
after_user_.LowPart = after_user.dwLowDateTime;
after_user_.HighPart = after_user.dwHighDateTime;
ULARGE_INTEGER elapsed;
elapsed.QuadPart = after_kernel_.QuadPart + after_user_.QuadPart - before_kernel_.QuadPart - before_user_.QuadPart;
ULONGLONG elapsed_ns{elapsed.QuadPart * 100};
std::cout << "cpu time: " << elapsed_ns / 1'000'000'000.0 << " sec\n";
}
int main ()
{
std::thread t1{thread, 1000000000};
std::thread t2{thread, 1000000000};
std::thread t3{thread, 3000};
t1.join();
t2.join();
t3.join();
return 0;
} |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(left two nits, feel free to ignore)
|
|
|
The code here assumes that Bitcoin Core is single threaded, which may be true right now, but could change in the future. |
|
|
|
|
|
Latest changes since my last review LGTM. I plan to review the code more closely. Empirically, running a node on this branch, it looks like over time most peers settle into less than 0.5 per milles, a few are higher, with maybe one peer around 4. These higher-load connections seem to more often be outbound ones. (Edit, updated the branch with the -netinfo commit with a few improvements and to also only display cpu load for a peer (rounded to nearest integer) if it is non-zero (e.g. >= 0.5 in getpeerinfo), to more easily see which peers are using more CPU: https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/ |
|
|
Add a new field `cpu_load` to the output of `getpeerinfo` RPC. It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same `cpu_load` number. Github-Pull: bitcoin#31672 Rebased-From: 8571fbb123dc91cab6903d22881da874ad91fcef
Github-Pull: bitcoin#31672 Rebased-From: 8b8b85434653f66a275a1f757d49a719847b57e5
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Add a new field `cpu_load` to the output of `getpeerinfo` RPC. It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same `cpu_load` number.
|
|
Add a new field `cpu_load` to the output of `getpeerinfo` RPC. It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same `cpu_load` number. Github-Pull: bitcoin#31672 Rebased-From: f1517d9
Github-Pull: bitcoin#31672 Rebased-From: b25b40e
|
NACK. cpu load alone is a useless metric. However, CPU load divided by TXs accepted into the chain (or mempool) could be a useful metric. |
| RPC | ||
| --- | ||
|
|
||
| A new field `cpu_load` has been added to the `getpeerinfo` RPC output. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why monitor cpu time rather than real time? Isn't time spent waiting for disk access while looking up the utxo set equally problematic? Waiting on another thread's lock is arguably less problematic (it's the other thread that's delaying things), but if an attacker is deliberately triggering lock contention somehow to delay processing other thread's messages, that would still be a problem.
Adding OS-specific logic rather than just using either real time or a steady clock seems over-complicated here.
[EDIT: I guess in the context of rate-limiting expensive peers, it would be better to have a rolling average of time used for each peer, so that having sent expensive txs five hours ago has ~0 impact on how much cpu time you're getting now]


Add a new field
cpu_loadto the output ofgetpeerinfoRPC.It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same
cpu_loadnumber.Monitoring CPU usage is useful on its own. Also related to #31033.
This PR uses
clock_gettime()(POSIX) andGetThreadTimes()(Windows). An alternative to those, should this be explored are:getrusage()(POSIX) andQueryThreadCycleTime()(Windows, but it counts CPU cycles).