Add `AlertRule` custom resource definition and AlertRule operator that manages it #616

kevinjqiu · 2017-09-13T19:23:29Z

We use the alert rules and dashboards provided by kube-prometheus to monitor our Kubernetes cluster. Big props to those who contributed to these rules and dashboards - it really saved us a lot of time.

However, the way kube-prometheus alerts (and dashboards) are compiled it's a little less than ideal. Namely if we want to customize, say, the threshold of certain alerts, we have to template the alert rule files (using awk/sed magic), "compile it" into a ConfigMap and submit the manifest to kubernetes.

Alternatively we could package the alerts into a helm chart. However, helm does not provide a mechanism to pre-assemble and render assets (in this case, render the rule files and assemble them into a ConfigMap).

This led us to think that it might be more convenient for the alert rules to be CRDs and we can have an operator to manage them.

Here's how an AlertRule CRD may look like:

apiVersion: monitoring.coreos.com/v1alpha1
kind: Alertrule
metadata:
  name: alertmanager-config-inconsistent
  labels:
    app: alertmanager
    prometheus: prometheus
spec:
  definition: |
    ALERT AlertmanagerConfigInconsistent
      IF   count_values by (service) ("config_hash", alertmanager_config_hash)
         / on(service) group_left
           label_replace(prometheus_operator_alertmanager_spec_replicas, "service", "alertmanager-$1", "alertmanager", "(.*)") != 1
      FOR 5m
      LABELS {
        severity = "critical"
      }
      ANNOTATIONS {
        summary = "Alertmanager configurations are inconsistent",
        description = "The configuration of the instances of the Alertmanager cluster `{{$labels.service}}` are out of sync."
      }

When the AlertRule object is created, the operator creates a corresponding ConfigMap, and it will get picked up by prometheus if the label is set to meet the criteria of the prometheus' ruleSelector.

I have a working POC branch here: #619
Here's an asciinema demo: https://asciinema.org/a/3BwHT6nT5IqqrEWODho7B7F6K

I think this could be a useful addition to the Prometheus operator. Curious to know if that's something that fit your vision of the project.

Cheers.

The text was updated successfully, but these errors were encountered:

kevinjqiu · 2017-09-15T15:11:56Z

Might be an overkill... closing it.

This was referenced Sep 13, 2017

rework helm-charts #592

Closed

[Feature] CRD for AlertRule and an operator to manage Alert Rules #619

Closed

kevinjqiu closed this as completed Sep 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `AlertRule` custom resource definition and AlertRule operator that manages it #616

Add `AlertRule` custom resource definition and AlertRule operator that manages it #616

kevinjqiu commented Sep 13, 2017 •

edited

Loading

kevinjqiu commented Sep 15, 2017

Add AlertRule custom resource definition and AlertRule operator that manages it #616

Add AlertRule custom resource definition and AlertRule operator that manages it #616

Comments

kevinjqiu commented Sep 13, 2017 • edited Loading

kevinjqiu commented Sep 15, 2017

Add `AlertRule` custom resource definition and AlertRule operator that manages it #616

Add `AlertRule` custom resource definition and AlertRule operator that manages it #616

kevinjqiu commented Sep 13, 2017 •

edited

Loading