Skip to content

xds: Do not log an error on a done context#42990

Merged
joestringer merged 1 commit intocilium:mainfrom
jrajahalme:xds-do-not-log-error-on-done-ctx
Dec 2, 2025
Merged

xds: Do not log an error on a done context#42990
joestringer merged 1 commit intocilium:mainfrom
jrajahalme:xds-do-not-log-error-on-done-ctx

Conversation

@jrajahalme
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme commented Nov 26, 2025

reflect.Select chooses the select case randomly when multiple cases could be chosen. If a watch channel is chosen due to it getting closed, it is likely that the reason for the closure was context cancellation, rather than some other error condition. If this is the case debug log the cancellation via the done channel case, instead of logging a watch error.

This fixes CI flakes due to unexpected error logs like here:

level=debug source=/go/src/github.com/cilium/cilium/pkg/envoy/xds/watcher.go:169 msg="context canceled, terminating resource watch" module=agent.controlplane.envoy-proxy xdsAckedVersion=21 xdsClientNode=127.0.0.1 xdsTypeURL=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret level=error source=/go/src/github.com/cilium/cilium/pkg/envoy/xds/server.go:444 msg="xDS resource watch failed; terminating" module=agent.controlplane.envoy-proxy xdsStreamID=21 xdsClientNode=host127.0.0.1no-id~localdomain xdsTypeURL=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret

@jrajahalme jrajahalme requested a review from a team as a code owner November 26, 2025 12:36
@jrajahalme jrajahalme added the area/CI Continuous Integration testing issue or flake label Nov 26, 2025
@jrajahalme jrajahalme requested a review from sayboras November 26, 2025 12:36
@jrajahalme jrajahalme added area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/misc This PR makes changes that have no direct user impact. labels Nov 26, 2025
@jrajahalme
Copy link
Copy Markdown
Member Author

/test

reflect.Select chooses the select case randomly when multiple cases could
be chosen. If a watch channel is chosen due to it getting closed, it is
likely that the reason for the closure was context cancellation, rather
than some other error condition. If this is the case debug log the
cancellation via the done channel case, instead of logging a watch
error.

This fixes CI flakes due to error logs like here:

level=debug source=/go/src/github.com/cilium/cilium/pkg/envoy/xds/watcher.go:169 msg="context canceled, terminating resource watch" module=agent.controlplane.envoy-proxy xdsAckedVersion=21 xdsClientNode=127.0.0.1 xdsTypeURL=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret
level=error source=/go/src/github.com/cilium/cilium/pkg/envoy/xds/server.go:444 msg="xDS resource watch failed; terminating" module=agent.controlplane.envoy-proxy xdsStreamID=21 xdsClientNode=host~127.0.0.1~no-id~localdomain xdsTypeURL=type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.Secret

Signed-off-by: Jarno Rajahalme <[email protected]>
@jrajahalme jrajahalme force-pushed the xds-do-not-log-error-on-done-ctx branch from 6707696 to 2e44683 Compare November 26, 2025 13:12
@jrajahalme
Copy link
Copy Markdown
Member Author

/test

@joestringer joestringer enabled auto-merge December 1, 2025 22:18
@joestringer
Copy link
Copy Markdown
Member

joestringer commented Dec 1, 2025

Is this worth backporting to v1.18 for CI stability there?

@joestringer joestringer added this pull request to the merge queue Dec 2, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 2, 2025
Merged via the queue into cilium:main with commit af7ec5d Dec 2, 2025
75 checks passed
@jrajahalme jrajahalme added the needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch label Dec 2, 2025
@jrajahalme
Copy link
Copy Markdown
Member Author

Is this worth backporting to v1.18 for CI stability there?

Added needs-backport/1.18 label, although IMO this was probably a rare flake, but should not hurt either :-)

@tklauser tklauser mentioned this pull request Dec 3, 2025
7 tasks
@tklauser tklauser added backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Dec 3, 2025
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. and removed backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. labels Dec 4, 2025
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

5 participants