Skip to content

Tests for #669#691

Merged
mosquito merged 5 commits intomasterfrom
hotfix/issue-669
Feb 23, 2026
Merged

Tests for #669#691
mosquito merged 5 commits intomasterfrom
hotfix/issue-669

Conversation

@mosquito
Copy link
Copy Markdown
Owner

@mosquito mosquito commented Jan 12, 2026

Fixes #669

The root cause was two resilience gaps in RobustConnection's reconnection path.

How stuck connection detection works: aiormq's heartbeat monitor detects no frames received within the grace timeout, cancels the reader task, and the resulting CancelledError propagates through aiormq's closing future into RobustConnection._on_connection_close(), which should set an internal event to wake the reconnection loop.

Gap 1: _on_connection_close not resilient to callback exceptions. If close_callbacks(exc) (called via super()._on_connection_close()) raised any exception, __connection_close_event.set() was never reached. The reconnection factory loop never woke up, and the connection was permanently dead.

Gap 2: __connection_factory didn't handle asyncio.CancelledError. In Python 3.9+, CancelledError is a BaseException, so it's not caught by except CONNECTION_EXCEPTIONS or
except Exception. If CancelledError was raised during a reconnection attempt (e.g. from aiormq cleanup interacting with the new connection), the factory task would crash silently and reconnection would stop permanently.

Fix: Wrapped the super()._on_connection_close() call in try/except so the reconnection event is always set, and added explicit CancelledError handling in the connection factory loop so it retries instead of crashing.

@coveralls
Copy link
Copy Markdown

coveralls commented Jan 12, 2026

Coverage Status

coverage: 91.977% (+0.1%) from 91.833%
when pulling 46d4863 on hotfix/issue-669
into 0a9f24c on master.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds regression coverage and hardens RobustConnection’s reconnection loop to address cases where reconnection can silently stop after a “stuck” connection (issue #669).

Changes:

  • Add new proxy-based integration tests around reconnect behavior and consumer iteration across reconnects.
  • Make _on_connection_close() resilient to exceptions thrown by close callbacks so the reconnect loop is always woken.
  • Add explicit asyncio.CancelledError handling in the connection factory loop to avoid unexpected task death during reconnect attempts.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tests/test_amqp_robust_proxy.py Adds new reconnect-focused integration tests (stuck/heartbeat scenario + queue iterator across reconnect).
aio_pika/robust_connection.py Improves reconnect-loop resilience by guarding close callbacks and handling CancelledError during connection attempts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aio_pika/robust_connection.py
Comment thread aio_pika/robust_connection.py
Comment thread tests/test_amqp_robust_proxy.py
Comment thread tests/test_amqp_robust_proxy.py
Alviner
Alviner previously approved these changes Feb 21, 2026
@mosquito mosquito dismissed Alviner’s stale review February 21, 2026 20:25

The merge-base changed after approval.

decaz
decaz previously approved these changes Feb 22, 2026
@mosquito mosquito dismissed decaz’s stale review February 22, 2026 11:02

The merge-base changed after approval.

@mosquito mosquito requested review from Alviner and decaz February 22, 2026 11:08
@mosquito mosquito merged commit 57029c3 into master Feb 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RobustConnection does not reconnect after "Server connection was stuck"

5 participants