-
Notifications
You must be signed in to change notification settings - Fork 55
Vitess: ignore unhealthy replicas with realtime stats #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vitess: ignore unhealthy replicas with realtime stats #136
Conversation
|
With some advice from PlanetScale I believe I've matched the logic Path I followed through the code:
|
…b.com/github/freno into vitess-ignore-replication-not-running
shlomi-noach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be right. Mind you that I'm not that familiar with realtime stats in Vitess.
After some testing with Vitess the logic in this PR won't work as it stands. When a node is considered "unhealthy" by Vitess, the following is returned by the $ curl -ks https://<vtctld hostname>/api/keyspace/test_ks/tablets/?cells=dc1 \
| jq '.[] | select( .hostname == "<hostname>" ).stats'
{
"realtime": {
"seconds_behind_master": 30
},
"serving": false,
"up": true
}This response was gathered from a node that exceeds the While Vitess won't send reads to a replica in This means:
In my testing If Vitess periodically updated the Realtime stats cc @tomkrouper / @shlomi-noach for thoughts |
|
Re-requesting review from @shlomi-noach, @drogart and @tomkrouper |
…b.com/github/freno into vitess-ignore-replication-not-running
This PR causes Freno to ignore Vitess tablets that return unhealthy realtime tablet stats, similar to the way
vtgatefilters replicas for serving traffic, minus replication and minimum node count checksThe added logic in this PR relies on realtime stats (an optional feature) and new
vtctldAPI fields added to Vitess 8.0.0, but if these fields are not found the old logic is used, making this change backwards-compatible. If no stats are found tablets are assumed to be healthy like we do todayExample API of the new
/keyspace/<ks>/tablets/API response with additional realtime stats:For the most part the logic is:
(we need to ignoreservingmust betrueserving, see comments below)last_errormust be""(empty)realtimesub-document is notnilvtctldyet, but it's a guess - just copying what Vitess did so we see the same tablets asvtgate/clients 👍cc @tomkrouper / @drogart / @shlomi-noach