Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AlertRule custom resource definition and AlertRule operator that manages it #616

Closed
kevinjqiu opened this issue Sep 13, 2017 · 1 comment

Comments

@kevinjqiu
Copy link
Contributor

kevinjqiu commented Sep 13, 2017

We use the alert rules and dashboards provided by kube-prometheus to monitor our Kubernetes cluster. Big props to those who contributed to these rules and dashboards - it really saved us a lot of time.

However, the way kube-prometheus alerts (and dashboards) are compiled it's a little less than ideal. Namely if we want to customize, say, the threshold of certain alerts, we have to template the alert rule files (using awk/sed magic), "compile it" into a ConfigMap and submit the manifest to kubernetes.

Alternatively we could package the alerts into a helm chart. However, helm does not provide a mechanism to pre-assemble and render assets (in this case, render the rule files and assemble them into a ConfigMap).

This led us to think that it might be more convenient for the alert rules to be CRDs and we can have an operator to manage them.

Here's how an AlertRule CRD may look like:

apiVersion: monitoring.coreos.com/v1alpha1
kind: Alertrule
metadata:
  name: alertmanager-config-inconsistent
  labels:
    app: alertmanager
    prometheus: prometheus
spec:
  definition: |
    ALERT AlertmanagerConfigInconsistent
      IF   count_values by (service) ("config_hash", alertmanager_config_hash)
         / on(service) group_left
           label_replace(prometheus_operator_alertmanager_spec_replicas, "service", "alertmanager-$1", "alertmanager", "(.*)") != 1
      FOR 5m
      LABELS {
        severity = "critical"
      }
      ANNOTATIONS {
        summary = "Alertmanager configurations are inconsistent",
        description = "The configuration of the instances of the Alertmanager cluster `{{$labels.service}}` are out of sync."
      }

When the AlertRule object is created, the operator creates a corresponding ConfigMap, and it will get picked up by prometheus if the label is set to meet the criteria of the prometheus' ruleSelector.

I have a working POC branch here: #619
Here's an asciinema demo: https://asciinema.org/a/3BwHT6nT5IqqrEWODho7B7F6K

I think this could be a useful addition to the Prometheus operator. Curious to know if that's something that fit your vision of the project.

Cheers.

@kevinjqiu
Copy link
Contributor Author

Might be an overkill... closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant