Prometheus Alertmanager + extra args + rules#4220
Conversation
ec694ea to
52dc604
Compare
|
Thanks @naseemkullah ! Regarding the I think decoupling these configs from the main Linkerd |
zaharidichev
left a comment
There was a problem hiding this comment.
I agree with @alpeb wrt to decoupling
|
Thanks for the feedback @alpeb and @zaharidichev, I'll go with that approach then. |
|
Just pushed a change, but I have not had a chance to test it yet. |
abbcd7f to
95d077d
Compare
alpeb
left a comment
There was a problem hiding this comment.
Thanks for the quick update @naseemkullah 👍
I managed to test this successfully after a few changes that I explain below.
95d077d to
bfdcba7
Compare
|
Just removed the leftover values and fixed indentations @alpeb , pardon the force-push! |
85d5174 to
9d179a8
Compare
alpeb
left a comment
There was a problem hiding this comment.
I've added some new comments below.
Also, I forgot to mention you also need to update pkg/charts/linkerd2/values.go referencing these new properties you added (that's used when installing through the CLI, not Helm).
9d179a8 to
6118b36
Compare
|
Hey Naseem, Thanks for taking this up! I've been trying to fix this problem in a more broader sense (and using the current add-on model), and have a rfc up for the same. It would be really awesome, to have your reviews/suggestions there as a end user/contributor. I have the following question:
Also, The RFC model ain't much different from this PR, especially that we get better separation of configuration there i.e |
Hi @Pothulapati, thanks for bringing up these questions. The original PR had a
The flat values (e.g.
Seeing that this PR was opened quite a while ago, effort was put in to it and most importantly it is ready to be merged. The RFC can fill in any gaps/continue to extract prometheus into an add-on if that is ok? |
|
cc @alpeb Can we please merge this if it is good to go? Thanks! |
8e4c8d3 to
f5bc580
Compare
@Pothulapati Just for decoupling those configs from the rest, in generic configmaps instead of somehing linkerd-specific like @naseemkullah I was waiting for the Having Prometheus become an add-on, its config items will change to become something similar to what we have for |
Yep, that makes sense. Also, okay with merging this as the rfc implementation to get merged could take some time! |
Oh right, I'll have that in shortly! Thanks. |
alpeb
left a comment
There was a problem hiding this comment.
@naseemkullah To pass the test, we need to reset some values, and also add a workaround for the mergo lib we depend on for merging values when using both values.yaml and values-ha.yaml. Please apply the following patch:
diff --git a/pkg/charts/linkerd2/values_test.go b/pkg/charts/linkerd2/values_test.go
index 11c3f304..323a3930 100644
--- a/pkg/charts/linkerd2/values_test.go
+++ b/pkg/charts/linkerd2/values_test.go
@@ -14,24 +14,27 @@ func TestNewValues(t *testing.T) {
testVersion := "linkerd-dev"
expected := &Values{
- Stage: "",
- ControllerImage: "gcr.io/linkerd-io/controller",
- ControllerImageVersion: testVersion,
- WebImage: "gcr.io/linkerd-io/web",
- PrometheusImage: "prom/prometheus:v2.15.2",
- GrafanaImage: "gcr.io/linkerd-io/grafana",
- ControllerReplicas: 1,
- ControllerLogLevel: "info",
- PrometheusLogLevel: "info",
- ControllerUID: 2103,
- EnableH2Upgrade: true,
- EnablePodAntiAffinity: false,
- WebhookFailurePolicy: "Ignore",
- OmitWebhookSideEffects: false,
- RestrictDashboardPrivileges: false,
- DisableHeartBeat: false,
- HeartbeatSchedule: "0 0 * * *",
- InstallNamespace: true,
+ Stage: "",
+ ControllerImage: "gcr.io/linkerd-io/controller",
+ ControllerImageVersion: testVersion,
+ WebImage: "gcr.io/linkerd-io/web",
+ PrometheusImage: "prom/prometheus:v2.15.2",
+ GrafanaImage: "gcr.io/linkerd-io/grafana",
+ ControllerReplicas: 1,
+ ControllerLogLevel: "info",
+ PrometheusLogLevel: "info",
+ PrometheusExtraArgs: map[string]string{},
+ PrometheusAlertmanagers: []interface{}{},
+ PrometheusRuleConfigMapMounts: []PrometheusRuleConfigMapMount{},
+ ControllerUID: 2103,
+ EnableH2Upgrade: true,
+ EnablePodAntiAffinity: false,
+ WebhookFailurePolicy: "Ignore",
+ OmitWebhookSideEffects: false,
+ RestrictDashboardPrivileges: false,
+ DisableHeartBeat: false,
+ HeartbeatSchedule: "0 0 * * *",
+ InstallNamespace: true,
Global: &Global{
Namespace: "linkerd",
ClusterDomain: "cluster.local",
@@ -228,6 +231,11 @@ func TestNewValues(t *testing.T) {
// Make Add-On Values nil to not have to check for their defaults
actual.Tracing = nil
+ // workaround for mergo, which resets these to []interface{}(nil)
+ // and []PrometheusRuleConfigMapMount(nil)
+ actual.PrometheusAlertmanagers = []interface{}{}
+ actual.PrometheusRuleConfigMapMounts = []PrometheusRuleConfigMapMount{}
+
if !reflect.DeepEqual(expected, actual) {
t.Errorf("Mismatch Helm HA defaults.\nExpected: %+v\nActual: %+v", expected, actual)
}30d3eeb to
a34bb0f
Compare
Thanks @alpeb , how did you reset the values for Changes applied, hoping tests will pass now. |
0dad051 to
e2907de
Compare
|
Worth noting: After applying the patch |
|
😭 |
|
@naseemkullah my fault; the mergo workaround should have gone inside |
e2907de to
e57c83d
Compare
alpeb
left a comment
There was a problem hiding this comment.
Thanks for the followup @naseemkullah . Just one more thing. The config map is rendering funny unless we trim .Values.prometheusAlertmanagers, so it would need something like this:
{{- toYaml .Values.prometheusAlertmanagers | trim | nindent 8 }}
Hi @alpeb thank for your guidance thus far. The current rendering of prometheusAlertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager.linkerd.svc:9093"
- scheme: https
static_configs:
- targets:
- alertmanager.mycomain.comis prometheus.yml: |-
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- alertmanager.linkerd.svc:9093
- scheme: https
static_configs:
- targets:
- alertmanager.mycomain.comwhich seems fine, in what case does it render funny? |
|
It renders fine locally, e.g. using |
8de3d78 to
759eaef
Compare
alpeb
left a comment
There was a problem hiding this comment.
Thanks @naseemkullah for the fast turnaround; from my perspective this is good to go 👍
charts/linkerd2/README.md
Outdated
| | `prometheusAlertmanagers` | Alertmanager instances the Prometheus server sends alerts to configured via the static_configs parameter. | `[]` | | ||
| | `prometheusImage` | Docker image for the Prometheus container | `prom/prometheus:v2.15.2` | | ||
| | `prometheusLogLevel` | Log level for Prometheus | `info` | | ||
| | `prometheusExtraArgs` | Extra command line options for Prometheus | `{}` | |
There was a problem hiding this comment.
nit: Can you move this up? as we followed alphabetical order here.
Pothulapati
left a comment
There was a problem hiding this comment.
Overall works great! Thanks for the awesome work!
|
Thanks @Pothulapati, my absolute pleasure! |
This allows end user flexibility for options such as log format. Rather than bubbling up such possible config options into helm values, extra arguments provides more flexibility.
Add prometheusAlertmanagers value allows configuring a list of statically targetted alertmanager instances.
Use rule configmaps for prometheus rules. They take a list of {name,subPath,configMap} values and mounts them accordingly. Provided that subpaths end with _rules.yml or _rules.yaml they should be loaded by prometheus as per prometheus.yml's rule_files content.
Signed-off-by: Naseem <[email protected]>
|
Hey @naseemkullah , please fill out this form if you'd like to receive some well-earned Linkerd swag! 😉 |
Thanks @alpeb! |
This PR allow users to use linkerd metrics directly from linkerd prometheus to setup alerts, furthermore it provides flexibility by means of extra cmd line arguments.
Add prometheus rules configurable values.
These are set as empty by default, but allow the user to configure recording and alerting rules on the linkerd prometheus.
Add prometheusAlertmanagers value.
This allows configuring a list of statically targetted alertmanager instances.
Allow addition of extra arguements.
This allows end user flexibility for options such as log format. Rather than bubbling up such possible config options into helm values, extra arguments provides more flexibility.