AG-302 - Flush all idle connections on first validation failure to speed up recovery after DB failover#182
Merged
barreiro merged 2 commits intoagroal:masterfrom May 4, 2026
Conversation
2e30022 to
c9c3e15
Compare
graben
reviewed
Mar 30, 2026
fc46f5b to
47a8fb7
Compare
graben
approved these changes
Mar 30, 2026
barreiro
requested changes
Apr 16, 2026
Contributor
barreiro
left a comment
There was a problem hiding this comment.
can't simply call flushPool(GRACEFULL) ?? that way that logic was reused
otherwise, I would rather continue to have validation in individual pieces of work (tasks) than a single one
the idea is good :)
1409d9c to
0dee6c9
Compare
The ternary operator has lower precedence than &&, so the expression `handler.isValid() && idle ? passValidationToIdle() : passValidationToActive()` was evaluated as `(isValid() && idle) ? ... : ...`, which skipped the isValid() check when idle=false.
…ed up recovery after DB failover When a database fails over without closing TCP connections (e.g. AWS RDS), every pooled connection becomes stale. Previously, each one was validated individually, blocking for the full socket timeout per connection. This made the application unavailable for (timeout × pool size) after failover. Now, once the first connection fails validation, all remaining idle connections are flushed immediately without attempting isValid() on each, reducing recovery time from O(N × timeout) to O(1 × timeout). On-borrow validation also skips stale connections while the pool is being refreshed. Review: reuse flushPool(GRACEFUL) and restore individual validation tasks Address review feedback by restoring per-connection ValidateConnectionTask pattern (consistent with LeakTask/ReapTask) and delegating flush to the existing flushPool(GRACEFUL) instead of a custom flush loop.
0dee6c9 to
770a510
Compare
barreiro
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a database fails over without closing TCP connections (e.g. AWS RDS), every pooled connection becomes stale. Previously, each one was validated individually, blocking for the full socket timeout per connection. This made the application unavailable for (timeout × pool size) after failover.
Now, once the first connection fails validation, all remaining idle connections are flushed immediately without attempting isValid() on each, reducing recovery time from O(N × timeout) to O(1 × timeout). On-borrow validation also skips stale connections while the pool is being refreshed.