Conversation
Now that prometheus is an add-on, There can be cases where prometheus is disabled at which the check should show a warning but not fail. This decouples the tight depedency. This changes the following checks: - Removes serviceAccount and pod checks in the CLI. - Relaxes `linkerd-api` checks to only check for prometheus access when the URL is not empty. This should work seamlessly with external prometheus as that URL will be passed and it performs the same check. Signed-off-by: Tarun Pothulapati <[email protected]>
Signed-off-by: Tarun Pothulapati <[email protected]>
Member
|
How feasible would it be to move those checks to its own section under |
Contributor
Author
|
@alpeb Yep, That's the plan :) |
alpeb
approved these changes
Jul 15, 2020
Member
alpeb
left a comment
There was a problem hiding this comment.
This looks good to me 👍
Looking forward for the other changes to have the public-api endpoints fail gracefully when there is no prometheus 😉
Signed-off-by: Tarun Pothulapati <[email protected]>
Contributor
Author
|
@alpeb Updated the controller instance to fail gracefully when there is no prometheus, works as expected both with CLI and as well as the dashboard. |
alpeb
added a commit
that referenced
this pull request
Jul 24, 2020
## The Problem `linkerd check` run right after install is failing because it can't find the Prometheus Pod. ## The Cause The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready. Since #4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries. ## The Fix The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind. This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this.
alpeb
added a commit
that referenced
this pull request
Jul 27, 2020
* Fixed `linkerd check` not finding Prometheus ## The Problem `linkerd check` run right after install is failing because it can't find the Prometheus Pod. ## The Cause The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready. Since #4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries. ## The Fix The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind. This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this.
han-so1omon
pushed a commit
to han-so1omon/linkerd2
that referenced
this pull request
Jul 28, 2020
* Fixed `linkerd check` not finding Prometheus ## The Problem `linkerd check` run right after install is failing because it can't find the Prometheus Pod. ## The Cause The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready. Since linkerd#4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries. ## The Fix The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind. This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this.
han-so1omon
pushed a commit
to han-so1omon/linkerd2
that referenced
this pull request
Jul 28, 2020
* Fixed `linkerd check` not finding Prometheus ## The Problem `linkerd check` run right after install is failing because it can't find the Prometheus Pod. ## The Cause The "control plane pods are ready" check used to verify the existence of all the control plane pods, blocking until all the pods were ready. Since linkerd#4724, Prometheus is no longer included in that check because it's checked separately as an add-on. An unintended consequence is that when the ensuing "control plane self-check" is triggered, Prometheus might not be ready yet and the check fails because it doesn't do retries. ## The Fix The "control plane self-check" uses a gRPC call (it's the only check that does that) and those weren't designed with retries in mind. This PR adds retry functionality to the `runCheckRPC()` function, making sure the final output remains the same It also temporarily disables the `upgrade-edge` integration test because after installing edge-20.7.4 `linkerd check` will fail because of this. Signed-off-by: Eric Solomon <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements second part of linkerd/rfc#16, #4375
Now that prometheus is an add-on (i.e #4362 ), There can be cases where prometheus is
disabled at which the check should show a warning but not fail. This
decouples the tight depedency.
This changes the following checks:
linkerd-apichecks to only check for prometheus access whenthe URL is not empty. This should work seamlessly with external
prometheus as that URL will be passed and it performs the same
check.
queryPromfunction to perform a check, to not crash butreport a failure to the dashboard and cli
Signed-off-by: Tarun Pothulapati [email protected]