Config for timeouts and delays in probes by jan-kantert · Pull Request #11458 · linkerd/linkerd2

jan-kantert · 2023-10-05T12:53:49Z

Proposed change to implement #11453

adleong

The default probe timeouts are 1s, so that's the current behavior. Changing the defaults from 1s to 30s and 5s respectively feels like a big change. I'm not sure how we would determine if these defaults are better.

What do you think about leaving the defaults as the current value of 1s? At least this change will allow users to configure these timeouts explicitly without changing the default behavior.

jan-kantert · 2023-10-23T13:04:40Z

Thanks @jan-kantert!

The default probe timeouts are 1s, so that's the current behavior. Changing the defaults from 1s to 30s and 5s respectively feels like a big change. I'm not sure how we would determine if these defaults are better.

What do you think about leaving the defaults as the current value of 1s? At least this change will allow users to configure these timeouts explicitly without changing the default behavior.

Sure we can do that. However, those small timeouts cause instability on clusters with high load. I personally would still at least increase the default for the livenessProbe (as Kubernetes recommends to have a longer timeout for your livenessProbe than your readinessProbe). This timeout also has been introduced more or less silently with kubernetes 1.20 (https://kubernetes.io/blog/2020/12/08/kubernetes-1-20-release-announcement/#exec-probe-timeout-handling).

alpeb

Thanks @jan-kantert 👍
Please complete the Proxy struct in ./pkg/charts/linkerd2/values.go with these new entries for the linkerd install CLI to be able to render them.
Also, please run ./bin/helm-docs for the comments added into the values.yaml file to be picked up in the chart's README.md file, and go test ./... -update to update the unit test golden files.

adleong · 2023-11-13T19:12:54Z

Hi @jan-kantert, just checking in to see if you have time to make the above changes?

mateiidavid · 2023-12-05T12:59:35Z

Hey @jan-kantert, just to give you an update: I pushed the requested changes to your branch. I did not change anything in the implementation (but I did modify the file to resolve a merge conflict) -- most of my work has been to fix the tests and get this reviewable.

Your effort is super appreciated! I think due to the nature of the issue itself (that you have described in #11453) we felt that it would be better to get this fix in soon so we can get it tested asap.

The contribution is solely yours :)

alpeb · 2023-12-05T13:13:54Z

charts/partials/templates/_proxy.tpl

+  initialDelaySeconds: {{.Values.proxy.livenessProbe.initialDelaySeconds | default 10}}
+  timeoutSeconds: {{.Values.proxy.livenessProbe.timeoutSeconds | default 1}}


I would get rid of the default statements, as we already have defaults in the values.yaml file, and to be consistent with the rest of the file (see startupProbe).

yeah I did not want to touch the impl but I agree.

alpeb · 2023-12-05T13:18:33Z

charts/linkerd-control-plane/values.yaml

  defaultInboundPolicy: "all-unauthenticated"
+  # -- LivenessProbe timeout and delay configuration
+  livenessProbe:
+    initialDelaySeconds: 10


So the default here used to be zero? Setting the new default to 10 is going to significantly delay the startup time for pods. How about leaving the default as it was? Pathological cases would still have an escape hatch with this new setting.

it was hardcoded in to 10, see the _proxy.tpl change. The default did not change. The only defaults we have changed are timeouts since we've added them in.

mateiidavid · 2023-12-06T10:18:14Z

@jan-kantert we've noticed your DCO check wasn't properly signed. We can take this across the finish line if you'd like, but we still need to have all of the commits signed to pass the check :)

kflynn · 2023-12-22T18:46:11Z

@jan-kantert Hey! We'd like to take this, if you can confirm that you agree with the whole DCO thing. A comment on this issue is fine. 🙂

jan-kantert · 2023-12-27T08:00:18Z

I agree to the DCO! Sorry that I did not get to it before Christmas. Thanks for taking care!

Add timeout config for readinessProbe and livenessProbes

Add defaults for timeouts and delays

Changed to previous default

changed to previous default

Signed-off-by: Matei David <[email protected]>

alpeb

I've rebased and pushed some minor changes. Good to go IMO 👍

adleong

This adds a way to configure probe initialDelay and timeout. Do we also anticipate needing to configure probe period? Should we just add that as well while we're here? Worth a thought but not a blocker.

@jan-kantert

This release addresses some issues in the destination service that could cause it to behave unexpectedly when processing updates. * Fixed a race condition in the destination service that could cause panics under very specific conditions ([#12022]; fixes [#12010]) * Changed how updates to a `Server` selector are handled in the destination service. When a `Server` that marks a port as opaque no longer selects a resource, the resource's opaqueness will reverted to default settings ([#12031]; fixes [#11995]) * Introduced Helm configuration values for liveness and readiness probe timeouts and delays ([#11458]; fixes [#11453]) (thanks @jan-kantert!) [#12010]: #12010 [#12022]: #12022 [#11995]: #11995 [#12031]: #12031 [#11453]: #11453 [#11458]: #11458 Signed-off-by: Matei David <[email protected]>

jan-kantert requested a review from a team as a code owner October 5, 2023 12:53

adleong reviewed Oct 5, 2023

View reviewed changes

jan-kantert mentioned this pull request Oct 26, 2023

Allow configuration of readinessProbe and livenessProbe timeouts in linkerd-proxy-injector #11453

Closed

alpeb reviewed Nov 1, 2023

View reviewed changes

mateiidavid self-assigned this Nov 30, 2023

alpeb reviewed Dec 5, 2023

View reviewed changes

jan-kantert and others added 7 commits January 5, 2024 10:38

Update _proxy.tpl

6508c83

Add timeout config for readinessProbe and livenessProbes

Update values.yaml

1f4c7e7

Add defaults for timeouts and delays

Update values.yaml

960c3ff

Changed to previous default

Update _proxy.tpl

4f2585c

changed to previous default

Add Go struct values

1fa578d

Signed-off-by: Matei David <[email protected]>

Update tests

822cb77

Signed-off-by: Matei David <[email protected]>

Rebase, final touches

c39b6db

alpeb force-pushed the config_for_timeouts_and_delays_in_probes branch from 08d7a34 to c39b6db Compare January 5, 2024 16:06

alpeb approved these changes Jan 5, 2024

View reviewed changes

adleong approved these changes Feb 2, 2024

View reviewed changes

mateiidavid merged commit af402a3 into linkerd:main Feb 8, 2024

mateiidavid mentioned this pull request Feb 8, 2024

edge-24.2.2 #12053

Merged

		initialDelaySeconds: {{.Values.proxy.livenessProbe.initialDelaySeconds \| default 10}}
		timeoutSeconds: {{.Values.proxy.livenessProbe.timeoutSeconds \| default 1}}

Conversation

jan-kantert commented Oct 5, 2023

Uh oh!

adleong left a comment

Choose a reason for hiding this comment

Uh oh!

jan-kantert commented Oct 23, 2023

Uh oh!

alpeb left a comment

Choose a reason for hiding this comment

Uh oh!

adleong commented Nov 13, 2023

Uh oh!

mateiidavid commented Dec 5, 2023

Uh oh!

alpeb Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

mateiidavid Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

alpeb Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

mateiidavid Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alpeb Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

mateiidavid commented Dec 6, 2023

Uh oh!

kflynn commented Dec 22, 2023

Uh oh!

jan-kantert commented Dec 27, 2023

Uh oh!

alpeb left a comment

Choose a reason for hiding this comment

Uh oh!

adleong left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mateiidavid Dec 5, 2023 •

edited

Loading