fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs#7872
Merged
zhaohuabing merged 20 commits intoenvoyproxy:mainfrom Jan 15, 2026
Merged
fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs#7872zhaohuabing merged 20 commits intoenvoyproxy:mainfrom
zhaohuabing merged 20 commits intoenvoyproxy:mainfrom
Conversation
36e3fe1 to
d4af0fb
Compare
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (48.57%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #7872 +/- ##
==========================================
- Coverage 72.80% 72.74% -0.07%
==========================================
Files 235 235
Lines 35313 35380 +67
==========================================
+ Hits 25709 25736 +27
- Misses 7781 7806 +25
- Partials 1823 1838 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
zhaohuabing
commented
Jan 7, 2026
Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing Zhao <[email protected]>
ff6bbad to
0e6c3a9
Compare
Signed-off-by: Huabing Zhao <[email protected]>
zhaohuabing
commented
Jan 7, 2026
arkodg
reviewed
Jan 7, 2026
arkodg
reviewed
Jan 7, 2026
✅ Deploy Preview for cerulean-figolla-1f9435 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: Huabing Zhao <[email protected]>
273f904 to
f96ea29
Compare
Signed-off-by: Huabing(Robin) Zhao <[email protected]>
zirain
previously approved these changes
Jan 14, 2026
Signed-off-by: Huabing (Robin) Zhao <[email protected]>
nareddyt
approved these changes
Jan 14, 2026
zirain
approved these changes
Jan 15, 2026
andreik-n2
pushed a commit
to andreik-n2/gateway
that referenced
this pull request
Jan 15, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]>
Member
|
FYI, during test #7964, it would take more 60s before runner return error with discovery failure. |
Member
Author
Member
zirain
pushed a commit
to zirain/gateway
that referenced
this pull request
Jan 26, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]>
rudrakhp
pushed a commit
to rudrakhp/gateway
that referenced
this pull request
Jan 26, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]>
zirain
added a commit
that referenced
this pull request
Jan 26, 2026
* fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs (#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix: extproc is discarded with failOpen is enabled for wasm (#7956) * fix: extproc is discarded with failOpen is enabled for wasm Signed-off-by: Huabing Zhao <[email protected]> * add test Signed-off-by: Huabing (Robin) Zhao <[email protected]> * polish code Signed-off-by: Huabing (Robin) Zhao <[email protected]> * add test Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix: sanitize control plane config dump (#7901) * mask secrets Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix: server run race (#7964) * add test Signed-off-by: zirain <[email protected]> * fix race Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * use Semaphore instead of WaitGroup Signed-off-by: zirain <[email protected]> * comments Signed-off-by: zirain <[email protected]> * lint Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * callback Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * run hook sequentially Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * rename to cfgMux Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> * fix: wrong cluster type with mixed FQDN backend and service backend refs (#7994) * fix: wrong cluster type with mixed FQDN backend and service backend refs Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix mirror cluster endpoint type Signed-off-by: Huabing (Robin) Zhao <[email protected]> * simplify the test Signed-off-by: Huabing (Robin) Zhao <[email protected]> * update comment Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix: merge route match rule with match all route (#8011) Signed-off-by: zirain <[email protected]> * fix gen Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * fix for golang 11.24 Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * fix watch CRD version Signed-off-by: zirain <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: zirain <[email protected]> Co-authored-by: Huabing (Robin) Zhao <[email protected]>
rudrakhp
added a commit
that referenced
this pull request
Jan 26, 2026
* fix: extproc is discarded with failOpen is enabled for wasm (#7956) * fix: extproc is discarded with failOpen is enabled for wasm Signed-off-by: Huabing Zhao <[email protected]> * add test Signed-off-by: Huabing (Robin) Zhao <[email protected]> * polish code Signed-off-by: Huabing (Robin) Zhao <[email protected]> * add test Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: sanitize control plane config dump (#7901) * mask secrets Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: server run race (#7964) * add test Signed-off-by: zirain <[email protected]> * fix race Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * use Semaphore instead of WaitGroup Signed-off-by: zirain <[email protected]> * comments Signed-off-by: zirain <[email protected]> * lint Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * callback Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * run hook sequentially Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * rename to cfgMux Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: wrong cluster type with mixed FQDN backend and service backend refs (#7994) * fix: wrong cluster type with mixed FQDN backend and service backend refs Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix mirror cluster endpoint type Signed-off-by: Huabing (Robin) Zhao <[email protected]> * simplify the test Signed-off-by: Huabing (Robin) Zhao <[email protected]> * update comment Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs (#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: merge route match rule with match all route (#8011) Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: do not set autoHTTPConfig when used mixed(HTTP + HTTPS) backends (#7950) * fix: do not set autoHTTPConfig when used mixed backend Signed-off-by: zirain <[email protected]> * release notes Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * add e2e Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: backend tls default namespace (#7987) Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: race in gatewaapi runner (#8037) * add testcase Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * simply Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * [release/v1.6] v1.6.3 release notes (#8054) Signed-off-by: Rudrakh Panigrahi <[email protected]> * v1.6.3 version Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix gen-check Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix lint Signed-off-by: Rudrakh Panigrahi <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> Signed-off-by: zirain <[email protected]> Co-authored-by: Huabing (Robin) Zhao <[email protected]> Co-authored-by: zirain <[email protected]>
SadmiB
pushed a commit
to SadmiB/gateway
that referenced
this pull request
Jan 30, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Sadmi Bouhafs <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
This PR adds retries to the controller when it fails to discover optional CRDs from the API server. If all retries fail, the error is propagated and causes the EG pod to restart. This prevents the EG pod from reconciling incomplete resources and serving partial xDS configuration to Envoy.
It also propagates runner startup errors to the server, so the Envoy Gateway process can exit and restart cleanly. Previously, runner startup failures were only logged, and Envoy Gateway continued running even with failed runners.
Fixes #7871
Release Notes: Yes