fix: race in gatewaapi runner#8037
Conversation
✅ Deploy Preview for cerulean-figolla-1f9435 canceled.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8037 +/- ##
==========================================
+ Coverage 73.70% 73.75% +0.04%
==========================================
Files 237 237
Lines 35703 35709 +6
==========================================
+ Hits 26316 26338 +22
+ Misses 7529 7515 -14
+ Partials 1858 1856 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
internal/gatewayapi/runner/runner.go
Outdated
| r.Logger = r.Logger.WithName(r.Name()).WithValues("runner", r.Name()) | ||
|
|
||
| go r.startWasmCache(ctx) | ||
| r.done.Add(2) |
There was a problem hiding this comment.
can you add a comment here, outlining why 2 is needed here
0314fcc to
9535661
Compare
| keyCache *KeyCache | ||
|
|
||
| // Goroutine synchronization | ||
| done sync.WaitGroup |
There was a problem hiding this comment.
nit: name it as wg? done.Done() was confusing at first glance.
internal/gatewayapi/runner/runner.go
Outdated
| go r.startWasmCache(ctx) | ||
| // Add 2 to the WaitGroup: one for the WASM cache server goroutine and one for the | ||
| // subscribeAndTranslate goroutine that handles resource translation | ||
| r.done.Add(2) |
There was a problem hiding this comment.
Would we want to add 2 right away or Add(1) just before calling each routine? We wouldn't want to wait on routines that might not have started for some reason. Also any new routine that we might add here will follow the same pattern.
// Increment by 1 specifically for the WASM cache
r.done.Add(1)
go func() {
defer r.done.Done()
r.startWasmCache(ctx)
}()
// If Subscribe crashes or returns an error, the WaitGroup
// won't be stuck waiting for a goroutine that never started.
c := r.ProviderResources.GatewayAPIResources.Subscribe(ctx)
// Increment by 1 specifically for the translation handler
r.done.Add(1)
go func() {
defer r.done.Done()
r.subscribeAndTranslate(c)
}()
There was a problem hiding this comment.
I have no strong opinion on this.
There was a problem hiding this comment.
+1 to keeping the Add close to the go func()
There was a problem hiding this comment.
we could simply it with r.done.Go()
| // t.Output() while goroutines are still active. | ||
| // | ||
| // Run with: go test -race -run TestRunnerGoroutineRace -count=100 ./internal/cmd/ | ||
| func TestRunnerGoroutineRace(t *testing.T) { |
There was a problem hiding this comment.
I see another test in runner_race_test, do we need this as well? Which one would we need to detect a race if someone spawns another routine without WG?
There was a problem hiding this comment.
this's in case we have another one unkonwn race, TBH it's hard to reproduce here.
1f73f66 to
80a7096
Compare
|
/retest |
80a7096 to
6ea93a3
Compare
Signed-off-by: zirain <[email protected]>
Signed-off-by: zirain <[email protected]>
Signed-off-by: zirain <[email protected]>
6ea93a3 to
2d84080
Compare
* add testcase Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * simply Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]>
* add testcase Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * simply Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]>
* fix: extproc is discarded with failOpen is enabled for wasm (#7956) * fix: extproc is discarded with failOpen is enabled for wasm Signed-off-by: Huabing Zhao <[email protected]> * add test Signed-off-by: Huabing (Robin) Zhao <[email protected]> * polish code Signed-off-by: Huabing (Robin) Zhao <[email protected]> * add test Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: sanitize control plane config dump (#7901) * mask secrets Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: server run race (#7964) * add test Signed-off-by: zirain <[email protected]> * fix race Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * use Semaphore instead of WaitGroup Signed-off-by: zirain <[email protected]> * comments Signed-off-by: zirain <[email protected]> * lint Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * callback Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * run hook sequentially Signed-off-by: zirain <[email protected]> * fix lint Signed-off-by: zirain <[email protected]> * rename to cfgMux Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: wrong cluster type with mixed FQDN backend and service backend refs (#7994) * fix: wrong cluster type with mixed FQDN backend and service backend refs Signed-off-by: Huabing (Robin) Zhao <[email protected]> * fix mirror cluster endpoint type Signed-off-by: Huabing (Robin) Zhao <[email protected]> * simplify the test Signed-off-by: Huabing (Robin) Zhao <[email protected]> * update comment Signed-off-by: Huabing (Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs (#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <[email protected]> * only retry transient errors Signed-off-by: Huabing Zhao <[email protected]> * fix potenial dead lock Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * minor wording Signed-off-by: Huabing Zhao <[email protected]> * create discovery client once Signed-off-by: Huabing Zhao <[email protected]> * fix lint Signed-off-by: Huabing Zhao <[email protected]> * address comments Signed-off-by: Huabing Zhao <[email protected]> * remove redundant logging Signed-off-by: Huabing Zhao <[email protected]> * add e2e test Signed-off-by: Huabing Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> * fix test Signed-off-by: Huabing(Robin) Zhao <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: merge route match rule with match all route (#8011) Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: do not set autoHTTPConfig when used mixed(HTTP + HTTPS) backends (#7950) * fix: do not set autoHTTPConfig when used mixed backend Signed-off-by: zirain <[email protected]> * release notes Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * add e2e Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: backend tls default namespace (#7987) Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix: race in gatewaapi runner (#8037) * add testcase Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * simply Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> * [release/v1.6] v1.6.3 release notes (#8054) Signed-off-by: Rudrakh Panigrahi <[email protected]> * v1.6.3 version Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix gen-check Signed-off-by: Rudrakh Panigrahi <[email protected]> * fix lint Signed-off-by: Rudrakh Panigrahi <[email protected]> --------- Signed-off-by: Huabing (Robin) Zhao <[email protected]> Signed-off-by: Rudrakh Panigrahi <[email protected]> Signed-off-by: zirain <[email protected]> Co-authored-by: Huabing (Robin) Zhao <[email protected]> Co-authored-by: zirain <[email protected]>
* add testcase Signed-off-by: zirain <[email protected]> * fix Signed-off-by: zirain <[email protected]> * simply Signed-off-by: zirain <[email protected]> --------- Signed-off-by: zirain <[email protected]> Signed-off-by: Sadmi Bouhafs <[email protected]>
fixes: #8035