-
Notifications
You must be signed in to change notification settings - Fork 391
feat: operator liveness metric #1621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
WIP discussing the possibility of a graph |
Given we need the metric, wouldn't it be better to merge and change later to improve its display? |
I agree with this. I don't know how much information an historical graph will add. |
MarcosNicolau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works locally!
I was missing the step to send proofs, but from the picture you send it looks like you are already doing that. I followed the steps again an it worked in my machine 🤔.
Yes, the metric is configured to show only the missed responses in the selected time range on the top-right corner: |
Oppen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked after solving the skill issue shortening ttl for tasks.
JuArce
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove the "total" transformation in grafana dashboard
- Sort the values by higher not responding operators first
|
I added the total transformation since when only one operator was missing tasks, no name was displayed on the bar gauge:
Also, the bar gauge lacked an option to dynamically order the labels. I addressed both in #15a53ae by switching from a "Bar Gauge" to a simple "table + gauge display" with successful results:
|






Operator Liveness Metric
Motivation
We need a way to rapidly determine if an operator is down.
Description
Adds a Bar Gauge with the count of missed tasks for each operator over a specified time range.
How To Test
missing_operatormethod from the telemetry terminal with the initialized operator names:localhost:3000and you should see the dashboard with the values.Test also the full flow:
config-files/config-aggregator.yamland reduce thebls_service_task_timeout.Type of change
Checklist
testnet, everything else tostaging