fix(cluster): Cluster reconnect sharded subscribers by PavelPashov · Pull Request #2060 · redis/ioredis

PavelPashov · 2026-01-13T16:33:41Z

When all sharded subscriber connections fail and the subsequent slots cache refresh returns ClusterAllFailedError, the cluster now properly enters reconnecting state instead of becoming zombied. This occurs when the cluster topology changes and all nodes are replaced with new IPs - the subscriber connections fail, triggering a slots refresh via the -node event handler. If this refresh fails (e.g., the duplicated connection for CLUSTER SLOTS times out or closes), the cluster becomes stuck in ready state with no working connections because normal pool connections use lazyConnect: true and never emit end events to trigger the drain->close->reconnect cycle. Now subscriber-triggered refreshSlotsCache() calls use a dedicated callback that detects ClusterAllFailedError and calls disconnect(true) to force reconnection, preventing the zombie state.

jit-ci · 2026-01-14T16:03:18Z

❌ Security scan failed

Security scan failed: Branch feat/cluster-reconnect-on-refresh-failure does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

jit-ci · 2026-01-14T17:02:59Z

❌ Security scan failed

Security scan failed: Branch feat/cluster-reconnect-on-refresh-failure does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

When all sharded subscriber connections fail and the subsequent slots cache refresh returns ClusterAllFailedError, the cluster now properly enters reconnecting state instead of becoming zombied. This occurs when the cluster topology changes and all nodes are replaced with new IPs - the subscriber connections fail, triggering a slots refresh via the "-node" event handler. If this refresh fails (e.g., the duplicated connection for CLUSTER SLOTS times out or closes), the cluster becomes stuck in "ready" state with no working connections because normal pool connections use lazyConnect: true and never emit "end" events to trigger the drain->close->reconnect cycle. Now subscriber-triggered refreshSlotsCache() calls use a dedicated callback that detects ClusterAllFailedError and calls disconnect(true) to force reconnection, preventing the zombie state.

jit-ci · 2026-01-15T09:09:21Z

❌ Security scan failed

Security scan failed: Branch feat/cluster-reconnect-on-refresh-failure does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

## [5.9.2](v5.9.1...v5.9.2) (2026-01-15) ### Bug Fixes * **cluster:** Cluster reconnect sharded subscribers ([#2060](#2060)) ([def9804](def9804)) * preserve replica slots on MOVED in pipelines ([#2059](#2059)) ([a1c3e9d](a1c3e9d)) ### Reverts * Revert "fix: preserve replica slots on MOVED in pipelines (#2059)" (#2062) ([517b932](517b932)), closes [#2059](#2059) [#2062](#2062)

github-actions · 2026-01-15T13:44:44Z

🎉 This PR is included in version 5.9.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

* fix: trigger reconnect when sharded subscriber slots refresh fails When all sharded subscriber connections fail and the subsequent slots cache refresh returns ClusterAllFailedError, the cluster now properly enters reconnecting state instead of becoming zombied. This occurs when the cluster topology changes and all nodes are replaced with new IPs - the subscriber connections fail, triggering a slots refresh via the "-node" event handler. If this refresh fails (e.g., the duplicated connection for CLUSTER SLOTS times out or closes), the cluster becomes stuck in "ready" state with no working connections because normal pool connections use lazyConnect: true and never emit "end" events to trigger the drain->close->reconnect cycle. Now subscriber-triggered refreshSlotsCache() calls use a dedicated callback that detects ClusterAllFailedError and calls disconnect(true) to force reconnection, preventing the zombie state. * test: ensure reconnect after sharded subscriber failure

PavelPashov requested a review from nkaradzhov January 13, 2026 16:33

PavelPashov force-pushed the feat/cluster-reconnect-on-refresh-failure branch from 6b76f15 to 00f3d4b Compare January 14, 2026 16:02

PavelPashov changed the title ~~feat(cluster): add reconnectOnRefreshFailure option~~ feat(cluster): Cluster reconnect sharded subscribers Jan 14, 2026

PavelPashov marked this pull request as ready for review January 14, 2026 17:02

PavelPashov added 2 commits January 15, 2026 11:07

test: ensure reconnect after sharded subscriber failure

5e703f6

PavelPashov force-pushed the feat/cluster-reconnect-on-refresh-failure branch from aac0707 to 5e703f6 Compare January 15, 2026 09:08

PavelPashov changed the title ~~feat(cluster): Cluster reconnect sharded subscribers~~ fix(cluster): Cluster reconnect sharded subscribers Jan 15, 2026

nkaradzhov approved these changes Jan 15, 2026

View reviewed changes

PavelPashov merged commit def9804 into redis:main Jan 15, 2026
11 checks passed

PavelPashov deleted the feat/cluster-reconnect-on-refresh-failure branch January 15, 2026 11:16

github-actions bot added the released label Jan 15, 2026

chrisbbreuer mentioned this pull request Jan 15, 2026

chore(deps): update all non-major dependencies stacksjs/ts-cache#1854

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cluster): Cluster reconnect sharded subscribers#2060

fix(cluster): Cluster reconnect sharded subscribers#2060
PavelPashov merged 2 commits intoredis:mainfrom
PavelPashov:feat/cluster-reconnect-on-refresh-failure

PavelPashov commented Jan 13, 2026 •

edited

Loading

Uh oh!

jit-ci bot commented Jan 14, 2026

Uh oh!

jit-ci bot commented Jan 14, 2026

Uh oh!

jit-ci bot commented Jan 15, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PavelPashov commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jit-ci bot commented Jan 14, 2026

❌ Security scan failed

Uh oh!

jit-ci bot commented Jan 14, 2026

❌ Security scan failed

Uh oh!

jit-ci bot commented Jan 15, 2026

❌ Security scan failed

Uh oh!

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PavelPashov commented Jan 13, 2026 •

edited

Loading