-
Notifications
You must be signed in to change notification settings - Fork 329
PS-1491: Fix double retry issue for k8s mode bootstrap #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
037bd62 to
2f1bb6b
Compare
internal/job/executor.go
Outdated
| return stdoutRedactor, logger | ||
| } | ||
|
|
||
| func (e *Executor) kubernetesSetup(ctx context.Context, environ *env.Environment, k8sAgentSocket *kubernetes.Client) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dead code. To be removed in #3629
kubernetes/client.go
Outdated
|
|
||
| const retryInterval = time.Second | ||
| maxAttempts := max(int(timeout.Seconds()/retryInterval.Seconds()), 1) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified a bit. It doesn't matter if maxAttempts is fixed at 120 - if the context times out after 120 seconds then the last few attempts won't happen (roko is context aware).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do but the issue is that I can't predict how high people will set the timeout to be 😬 . I set a 3600 max attempt as safety net, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3600 sounds reasonable. The other path would be roko.TryForever, if a user wants to configure the option arbitrarily poorly we could allow that.
DrJosh9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from my latest comment, LGTM
2f1bb6b to
ab3b35c
Compare
ab3b35c to
09db586
Compare
Description
Starting #3306 and subsequently worsen by #3398, we have created a double retry mechanism that will make k8s agent to spend 1200 seconds on retry when it's unable to connect to the coordinating agent container.
This PR remove double retry mechanism, centralize the retry logic to socket.Connect.
Context
PS-1491
Testing
go test ./...). Buildkite employees may check this if the pipeline has run automatically.go tool gofumpt -extra -w .)Disclosures / Credits
Human