-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] CRD for AlertRule and an operator to manage Alert Rules #619
Conversation
Can one of the admins verify this patch? |
Wow that's a lot of code, looks like you know your way around the Prometheus Operator code now 🙂. I'm somewhat confused though how this is different to having Regarding templating (although I realize this PR doesn't implement it yet). I know that this went very much side-wards with other Monitoring systems so I would be very careful with the implementation of such a feature. I personally believe that a proper tool that spits out the "templated" rules is better suited as it is better for visibility and comprehension than a transparently templated Rule, where one can't be sure whether it's a templated Rule or hand written. |
Most of it is from the alert manager operator thanks to Golang's lack of generics :)
Fair question. I wanted to make prometheus alert rules a Kubernetes-native object, instead of just a bunch of ...
spec:
name: AlertmanagerConfigInconsistent
condition: |-
count_values by (service) ("config_hash", alertmanager_config_hash)
/ on(service) group_left
label_replace(prometheus_operator_alertmanager_spec_replicas, "service", "alertmanager-$1", "alertmanager", "(.*)") != 1
duration: 5m
labels:
severity: critical
annotations:
summary: Alertmanager configurations are inconsistent
description: The configuration of the instances of the Alertmanager cluster `{{$labels.service}}` are out of sync. With regard to templating, I'm sorry I don't quite understand your comment here. Could you give an example? I'm thinking of leaving it totally up to helm to template a {{- if .Values.enabled }}
apiVersion: monitoring.coreos.com/v1alpha1
kind: Alertrule
metadata:
name: {{ template "fullname" . }}-alertmanager
label:
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
{{- if .Values.labels }}
{{ .Values.labels | toYaml | indent 4 }}
{{ end }}
spec:
definition: |
ALERT AlertmanagerConfigInconsistent
IF count_values by (service) ("config_hash", alertmanager_config_hash)
/ on(service) group_left
label_replace(prometheus_operator_alertmanager_spec_replicas, "service", "alertmanager-$1", "alertmanager", "(.*)") != 1
FOR {{ .Values.configInconsistentDuration }}
LABELS {
severity = "critical",
environment = {{ .Values.labels.environment | quote }}
}
ANNOTATIONS {
summary = "Alertmanager configurations are inconsistent",
description = "The configuration of the instances of the Alertmanager cluster {{`{{$labels.service}}`}} are out of sync."
}
{{ end }} |
Your example of templating would work just as well with a |
@brancz I hear you. I'm fine with closing it. All is not lost as at least now I know how prometheus-operator works at the code level :) |
See issue #616
Here's how an
AlertRule
CRD look like:When the
AlertRule
object is created, the operator creates a correspondingConfigMap
, and it will get picked up by prometheus if the label is set to meet the criteria of the prometheus'ruleSelector
.Here's an asciinema demo: https://asciinema.org/a/3BwHT6nT5IqqrEWODho7B7F6K
Let me know what you think.
I'll add some e2e tests if you think this is a useful feature for the prometheus-operator.