Skip to content

AG-302 - Flush all idle connections on first validation failure to speed up recovery after DB failover#182

Merged
barreiro merged 2 commits intoagroal:masterfrom
gastaldi:flush_on_validation_fail
May 4, 2026
Merged

AG-302 - Flush all idle connections on first validation failure to speed up recovery after DB failover#182
barreiro merged 2 commits intoagroal:masterfrom
gastaldi:flush_on_validation_fail

Conversation

@gastaldi
Copy link
Copy Markdown
Collaborator

When a database fails over without closing TCP connections (e.g. AWS RDS), every pooled connection becomes stale. Previously, each one was validated individually, blocking for the full socket timeout per connection. This made the application unavailable for (timeout × pool size) after failover.

Now, once the first connection fails validation, all remaining idle connections are flushed immediately without attempting isValid() on each, reducing recovery time from O(N × timeout) to O(1 × timeout). On-borrow validation also skips stale connections while the pool is being refreshed.

@gastaldi gastaldi requested review from barreiro and graben March 30, 2026 13:03
@gastaldi gastaldi force-pushed the flush_on_validation_fail branch from 2e30022 to c9c3e15 Compare March 30, 2026 13:09
Comment thread agroal-pool/src/main/java/io/agroal/pool/ConnectionPool.java Outdated
Comment thread agroal-pool/src/main/java/io/agroal/pool/ConnectionPool.java
Comment thread agroal-pool/src/main/java/io/agroal/pool/ConnectionPool.java
@gastaldi gastaldi force-pushed the flush_on_validation_fail branch 2 times, most recently from fc46f5b to 47a8fb7 Compare March 30, 2026 18:10
@graben graben self-requested a review March 30, 2026 18:58
@gastaldi gastaldi changed the title AG-302: Flush all idle connections on first validation failure to speed up recovery after DB failover AG-302 - Flush all idle connections on first validation failure to speed up recovery after DB failover Mar 30, 2026
Copy link
Copy Markdown
Contributor

@barreiro barreiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't simply call flushPool(GRACEFULL) ?? that way that logic was reused

otherwise, I would rather continue to have validation in individual pieces of work (tasks) than a single one

the idea is good :)

@gastaldi gastaldi force-pushed the flush_on_validation_fail branch 2 times, most recently from 1409d9c to 0dee6c9 Compare April 16, 2026 14:25
@gastaldi gastaldi requested a review from barreiro April 29, 2026 11:48
gastaldi added 2 commits May 4, 2026 13:04
The ternary operator has lower precedence than &&, so the expression
`handler.isValid() && idle ? passValidationToIdle() : passValidationToActive()`
was evaluated as `(isValid() && idle) ? ... : ...`, which skipped the
isValid() check when idle=false.
…ed up recovery after DB failover

When a database fails over without closing TCP connections (e.g. AWS RDS),
every pooled connection becomes stale. Previously, each one was validated
individually, blocking for the full socket timeout per connection. This
made the application unavailable for (timeout × pool size) after failover.

Now, once the first connection fails validation, all remaining idle
connections are flushed immediately without attempting isValid() on each,
reducing recovery time from O(N × timeout) to O(1 × timeout). On-borrow
validation also skips stale connections while the pool is being refreshed.

Review: reuse flushPool(GRACEFUL) and restore individual validation tasks

Address review feedback by restoring per-connection ValidateConnectionTask
pattern (consistent with LeakTask/ReapTask) and delegating flush to the
existing flushPool(GRACEFUL) instead of a custom flush loop.
@gastaldi gastaldi force-pushed the flush_on_validation_fail branch from 0dee6c9 to 770a510 Compare May 4, 2026 16:07
@barreiro barreiro merged commit 34e2d4f into agroal:master May 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants