Don't return alerts in the API (alerts/groups) until they are cross-c… by prymitive · Pull Request #692 · prometheus/alertmanager

prymitive · 2017-04-04T18:25:04Z

…hecked with silences

alerts/groups API endpoint provides "silenced" key for all alerts, which is set if any silence for this alert exist.
Due to how silences are matched with alerts, which happens out of band from API perspective, alerts/groups will return all alerts as soon as they created and only after silences are matched with an alert a "silenced" key will be added to the object in API response. This means that API clients that wants to rely on that key can't trust it, as the absence of "silenced" key doesn't guarantee that given alerts isn't silenced. This change fixes that by tracking which alert was matched against silences and only returning those alerts that were checked.
Fixes #609

Kellel · 2017-04-06T20:24:31Z

Nice! I've noticed this while working on #636 as well

prymitive · 2017-04-08T07:12:23Z

@stuartnelson3 does this look sane? It does break the behaviour of alert/groups API a bit, but I think that's acceptable given that the API isn't stable yet.

stuartnelson3 · 2017-04-08T11:09:18Z

It does change the behavior, but as you said, we aren't currently making any API stability guarantees.

Looking at this, it makes sense -- add a processed field so you know whether the API has looked at it. I'm just wondering if we should instead move from adding a boolean field to adding a status field, so that we don't keep adding more boolean flags in the future. We already having inhibited, silenced, and with this, processed.

So something like status: (unprocessed|active|silenced|inhibited) is what I'm thinking would actually be better. I realize this ends up with either tasking you with a refactoring of how this is managed internally, or waiting for one of us to do it.

@fabxc @brancz @mxinden -- opinion on merging all these booleans into a single status (or whatever) field?

prymitive · 2017-04-08T16:41:17Z

That does seem like an improvement. It doesn't seem like a big task and I'm happy to do it, will update once I have a new patch ready.

prymitive · 2017-04-11T21:37:19Z

I got back to it and took a look over the code. Current memMarker struct contains:

inhibited map[model.Fingerprint]struct{}
silenced map[model.Fingerprint]string

So those are not simple bool fields, looking at the code that uses those I see that:

inhibited value is not used - link
silenced value is used - link1 link2

So we can move those into the status field.

This field will be used to store alert flags, start with uint8 as we don't need too many bits

Inhibited flag will now be set on the status field, we don't need to keep track of this value, a single bit is enough

Once notify have a chance to resolve silences matched for given alert mark it also as processed, so we can start returning it in the group overview API

prymitive · 2017-04-11T23:19:44Z

types/types.go

+
+	// if there are no flags set remove the whole key
+	if m.status[alert] == 0 {
+		delete(m.status, alert)


Is there a point where we can nuke entire m.status[alert]? I'm not sure if this will ever be cleaned, current code used delete() if passed bool was false, but with status we no longer can delete the entire map if any other flag is set and processing one will be set for everything. Where should the cleanup code go?

Looking for the code that removes Alerts ... that should also (hopefully) have access to the marker map, so we can get rid of the alert.

prymitive · 2017-04-11T23:20:07Z

First go at the new patch

mxinden · 2017-04-12T12:36:01Z

@prymitive @stuartnelson3 I like the idea of the status: (unprocessed|active|silenced|inhibited) field as those four states are a disjoint set, right?

stuartnelson3 · 2017-04-12T13:14:07Z

More or less. Technically, an alert could be both inhibited AND silenced, but in that case I think it's only necessary to mark it as inhibited, and the result for the user is the same (i.e., no notification).

brancz · 2017-04-12T14:07:38Z

@stuartnelson3 in the notification pipeline if an alert is inhibited it will never reach the silence stage, therefore won't be added to the set of silenced alerts.

prymitive · 2017-04-12T18:19:54Z

Oh, if they are disjoint I can simplify this PR, but:

all but silenced are just a bool markers, so those can be a simple iota, but for silences we need to store the value as it's the id of a silence that we need to set on an alert, so we'll endup with having status: silenced and silenced: abcdf, both fields would need to be kept in sync
I'm still not sure when should I drop an alert from the marker map

stuartnelson3 · 2017-04-12T18:31:09Z

@brancz thanks for the clarification!

stuartnelson3 · 2017-04-12T18:37:41Z

@prymitive

yeah, right now you would need to keep both fields in sync. Another change I would like to make to the API is for an alert to always return a silenced field. Currently this field only exists in the JSON representation if it has a value, but I think having consistent JSON structure is more important than not having an empty value. (People are welcome to disagree with this opinion)
I haven't looked at the code so am not sure how the marker map is managed. I'll take a look tomorrow, or if @brancz wants to comment, that'd also be cool :)

prymitive · 2017-04-12T19:31:50Z

It could also be:

type Status uint8

const (
        Unprocessed = Status(iota)
        Active
        Silenced
        Inhibited
)

type alertStatus struct {
        status uint8
        value  string
}

 type memMarker struct {
 	status    map[model.Fingerprint]alertStatus 
  	mtx sync.RWMutex
  }

This way instead of having two variables to keep in sync we just have a single struct to worry about.

stuartnelson3 · 2017-04-12T20:23:15Z

We do have one struct, but we still have two values :) I'll think about it, but right now my gut feeling is to keep the actual alert JSON representation flat and not have a nested object containing status information. We can still accomplish that in the serialization, though, with this struct. So just go with this for now.

Took a stab at the code since alertStatus will need a custom MarshalJSON:

type state uint8

const (
	unprocessed state = iota
	active
	silenced
	inhibited
)

type alertStatus struct {
	status state
	value  string
}

func (a *alertStatus) MarshalJSON() ([]byte, error) {
	status, prs := statusMap[a.status]
	if !prs {
		status = "unknown"
	}
	return json.Marshal(map[string]string{
		"status": status,
		"value":  a.value,
	})
}

// For serializing status to JSON.
var statusMap = map[state]string{
	unprocessed: "unprocessed",
	active:      "active",
	silenced:    "silenced",
	inhibited:   "inhibited",
}

type memMarker struct {
	status map[model.Fingerprint]alertStatus

	sync.RWMutex
}

stuartnelson3 · 2017-04-13T11:00:56Z

I think the semantics in the marker map will stay the same.

If we SetSilenced, update the state to silenced depending on if a silence is passed into the method. If a silence is passed, set value to the silence id, if no silence is passed, unset the value. If it's already inhibited, don't change the state.
SetInhibited will, depending on the bool passed in, set the state to inhibited, or set it to silenced (value != ""), or active (value=="")
Set to active once it has been processed
Create the iota definition so that the 0 value is unprocessed

This pseudo-state machine should live within the marker implementation, I think, and we just make calls to different exposed status update methods.

stuartnelson3 · 2017-04-13T11:09:48Z

It seems alert cleanup happens in provider/mem/mem.go in the runGC() loop. I'm going to think about how best to do this, but potentially just adding a pointer to marker as an argument to NewAlerts when it gets created and managing it there..

fabxc · 2017-04-13T11:34:30Z

+1 for adding some sort of proper "status". Although I tend to think that a status object is more appropriate for reasons of extensibility (more below).

IMO, having the unprocessed state is pretty "meh" to begin with. Good to add here as the way things are but ideally, we'd have proper indexing of all alerts and silences that allows us to avoid this.
In an ideal scenario, our "status" object could not only tell us what state an alert is in, but also which silences or alerts (for inhibition) are muting it. That's of course not trivial with silences being time-bound (and editable) and alerts being entirely ephemeral and unpredictable.

Which tools would use this API extension/change if we added it now?
In general, I think I'm okay adding it as we'll likely introduce a v2 API anyway.

stuartnelson3 · 2017-04-13T13:25:11Z

It sounds like the main use would be for api consumers of large installations, both the UI and amtool.

Cool idea for displaying the alert responsible for inhibition, didn't even think of that. What would we use to identify the alert?

Once we have the mechanics in place we can clean up the actual state transitions for the changes we'll be making for the new api.

prymitive · 2017-04-13T18:14:29Z

Speaking of changes to the API - the silenced field only links to a single silence object, but there can be multiple matched silenced. I'm happy with that, but is this on purpose or accidental? From the POV of alert one is enough, but a person looking at this silenced alert might want to see all silences that are matched, but then again, if that list grows too much it stops being useful.
I think leaving it as is might be the best for now, but might be worth considering when designing v2 API.

I'll get back to making a new change set here.

matthiasr · 2017-04-14T11:43:34Z

If the API doesn't return an alert before it's processed, how will this look when I just created an alert? How can O know if anything happened at all?
I kind of lost track of what this PR is about, the original description is about filtering alerts altogether, but y'all are now discussing a different change it seems?

prymitive · 2017-04-14T16:35:44Z

That's a fair comment, discussion that followed got a little bit broad and I think I'll:

Close this PR as the proposed change here isn't getting merged
Document proposed change on API shows briefly reports alerts as not being silenced until silences are applied #609

The new change should be:

refactor Alertmanager alert status handling (introduce status field on the marker struct)
move silence linking elsewhere (to JSON marshaling like proposed above)
don't change what API responds with, all alerts should be exposed, if client wants to see only those that were processed then the status key in JSON reponse can be used for that

prymitive · 2017-04-14T18:52:47Z

Updated #609, closing this PR and will come up with a new patch, thanks all

prymitive · 2017-04-14T23:12:23Z

Followup in #711

* Remove unnecessary select statement * Remove unnecessary if-statement

prymitive added 3 commits April 11, 2017 15:25

Add status field to the marker struct

e538db1

This field will be used to store alert flags, start with uint8 as we don't need too many bits

Use status field for inhibition marks

8f94394

Inhibited flag will now be set on the status field, we don't need to keep track of this value, a single bit is enough

Add a flag for tracking which alert was processed by notify code

324a8e8

Once notify have a chance to resolve silences matched for given alert mark it also as processed, so we can start returning it in the group overview API

prymitive commented Apr 11, 2017

View reviewed changes

prymitive mentioned this pull request Apr 14, 2017

API shows briefly reports alerts as not being silenced until silences are applied #609

Closed

prymitive closed this Apr 14, 2017

prymitive deleted the issue-609 branch April 14, 2017 23:02

prymitive mentioned this pull request Apr 14, 2017

Issue 609 2 #711

Closed

hh pushed a commit to ii/alertmanager that referenced this pull request Oct 20, 2017

Remove unnecessary select statement (prometheus#692)

0b67638

* Remove unnecessary select statement * Remove unnecessary if-statement

Comments

Conversation

prymitive commented Apr 4, 2017

Uh oh!

Kellel commented Apr 6, 2017

Uh oh!

prymitive commented Apr 8, 2017

Uh oh!

stuartnelson3 commented Apr 8, 2017

Uh oh!

prymitive commented Apr 8, 2017

Uh oh!

prymitive commented Apr 11, 2017

Uh oh!

prymitive Apr 11, 2017

Choose a reason for hiding this comment

Uh oh!

stuartnelson3 Apr 13, 2017

Choose a reason for hiding this comment

Uh oh!

prymitive commented Apr 11, 2017

Uh oh!

mxinden commented Apr 12, 2017

Uh oh!

stuartnelson3 commented Apr 12, 2017

Uh oh!

brancz commented Apr 12, 2017

Uh oh!

prymitive commented Apr 12, 2017

Uh oh!

stuartnelson3 commented Apr 12, 2017

Uh oh!

stuartnelson3 commented Apr 12, 2017

Uh oh!

prymitive commented Apr 12, 2017

Uh oh!

stuartnelson3 commented Apr 12, 2017

Uh oh!

stuartnelson3 commented Apr 13, 2017

Uh oh!

stuartnelson3 commented Apr 13, 2017

Uh oh!

fabxc commented Apr 13, 2017

Uh oh!

stuartnelson3 commented Apr 13, 2017

Uh oh!

prymitive commented Apr 13, 2017

Uh oh!

matthiasr commented Apr 14, 2017

Uh oh!

prymitive commented Apr 14, 2017

Uh oh!

prymitive commented Apr 14, 2017

Uh oh!

prymitive commented Apr 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants