Skip to content
This repository was archived by the owner on Jul 22, 2020. It is now read-only.

Comments

Collect alerts from multiple Alertmanager instances#124

Merged
prymitive merged 44 commits intomasterfrom
multi-upstream
Jul 2, 2017
Merged

Collect alerts from multiple Alertmanager instances#124
prymitive merged 44 commits intomasterfrom
multi-upstream

Conversation

@prymitive
Copy link
Contributor

@prymitive prymitive commented Jun 25, 2017

Fixes #121, this is a big stream of changes, since adding support for multiple alertmanager instances means rewriting a lot of code.
Also fixes #37 because running Alertmanager in HA mode is pretty much the same as running 2 independent instances, HA is really only used to de-duplicate notifications, which has nothing to do with the API.

Tasks:

  • configuration syntax - need to support multiple URIs, should allow naming AM instances, if possible configuring timeouts, having a config file starts to be a good idea
  • Fail on startup if no valid Alertmanager instance is configured
  • alerts needs to be tagged with alertmanager instances they were collected from
  • silence API endpoint URI (used by silence form) needs to be instance aware, probably best if each alert was tagged with instance: [am1, am2], name -> uri mapping provided in the json reponse and form would send silences to all instance where alert is spotted (or even allow use to select instances where it should be silenced, probably the best)
  • internal metrics will need to have another dimension with the instance name
  • upstream errors are now per instance, not global
  • error handling needs to be refactored - it's no longer binary, one instance might be down but other up, so instead of fullscreen error we only need a top bar warning if it's only partial issue (fullscreen only if all am instances are down)
  • alert source button needs to be per instance, this probably means that there should be a single button/label that will trigger a modal with list of instances and details (like link to source per instance).
  • UI showing alertmanager instances for each alert
  • Verify that sorting of alert groups on the group still works, UI seems to be re-ordering it
  • Alertmanager instance filter
  • alert timestamps should probably be per instance too, might use same UI for details, but show oldest timestamp for startsAt, this will also affect alert list sorting, which is now unstable because we use random alert instance, so the timestamp value can change all the time
  • compress space by removing keys on standard buttons (like @receiver or @alertmanager), key names should only show up on hover, this will compensate longer list of standard buttons
  • cache is cleared on any upstream error, verify if that can be done better

This will make it easier to tell which tests are being run, since some are optional and depend on tools being installed
Unique uri is required for silence form result tracking
…the UI

Silence result UI will now show all selected upstreams and provide individual results for each
Copy link

@Tenzer Tenzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is too big for me to review properly, but I'll assume you have done testing of it.

@prymitive prymitive merged commit 7c518c9 into master Jul 2, 2017
@prymitive prymitive deleted the multi-upstream branch July 2, 2017 16:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drill alerts from more than one alertmanager? How to configure for HA Alertmanagers?

2 participants