-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Local Ratekeeper #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local Ratekeeper #1477
Conversation
Co-Authored-By: mpilman <[email protected]>
checkAndProcessResult must handle errors in LoadBalancedReply objects in the same way other errors are handled.
|
ReadOPS = Read operations / second |
I think I didn't explain myself clearly. I understood those abbreviations. What I asked is what do the lines in different colors mean in each of those figures. |
|
Ah, got it. Each line represents each storage process. In the NDV chart, there are 3 lines moving because one team (triple redundancy) is being hot. |
|
In addition to those per-server-process charts, the "Local Ratekeeper" chart shows the aggregate of the "probability of serving reads" of all storage processes. The "Ratekeeper" chart and "Worst Storage Queue" are taken from the ratekeeper metrics. |
|
My impression for local ratekeeper was that ratekeeper would only allow two (with triple replication) storage servers that they were allowed to start limiting their reads. This implementation lets all storage servers limit themselves, and additionally will cause the main ratekeeper to throttle if they fall too far behind. With this implementation I am concerned that a saturating workload could cause all storage servers to each decide to throttle reads. This would lead to higher read latencies, which could cause transactions to start taking longer than 5 seconds, and lead to a death spiral. |
fdbserver/storageserver.actor.cpp
Outdated
|
|
||
| double getPenalty() { | ||
| return std::max(1.0, (queueSize() - (SERVER_KNOBS->TARGET_BYTES_PER_STORAGE_SERVER - 2.0*SERVER_KNOBS->SPRING_BYTES_STORAGE_SERVER)) / SERVER_KNOBS->SPRING_BYTES_STORAGE_SERVER); | ||
| return std::max(std::min(1.0, (queueSize() - (SERVER_KNOBS->TARGET_BYTES_PER_STORAGE_SERVER - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The penalty should always be larger than 1. Load balance works by trying to keep an equal number of requests outstanding to all servers, a penalty makes a single outstanding request act like more than one request so you send less traffic to that server. I think the formula you want for penalty would be 1.0/(1.0-currentRate()), and everything should me max instead of having a min.
Penalty should always be >= 1.0
6f0af55 to
8dbb231
Compare
fdbserver/storageserver.actor.cpp
Outdated
| } | ||
|
|
||
| ACTOR Future<Void> updateStorage(StorageServer* data) { | ||
| state std::string waitDescription = format("%s/updateStorage", data->thisServerID.toString().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass this string directly into checkDisabled so we do not construct it when not in simulation


This is a continuation of PR #1229. It fixes some bugs from that pull request.